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Preface 


This book is written for behavioral and social science students at the advanced un- 
dergraduate or beginning graduate level. The text emphasizes conceptual under- 
standing, the effective use of statistical software to run the analyses, and the correct 
interpretation of results. Two statistical software packages, SAS and SPSS, are an 
integral part of each chapter. An annotated printout is given from at least one of the 
programs for each analysis. The annotations highlight what the numbers mean and 
how to interpret the results. The explanation appears on the printout or on the same 
page to enhance learning efficiency. The assumptions underlying each analysis are 
given special attention, and the reader is shown how to test the critical assump- 
tion(s) using SAS and SPSS. Power analysis is an integral part of the book. There 
are no computational formulas in this text. I took the position that they were not 
needed many years ago, and it is even truer today. 

The instructional mix of strategies that is employed to illustrate each statistical 
technique consists of two parts (a) First, I use definitional formulas on small data 
sets to convey conceptual insight into what is being measured, and (b) Then, I pro- 
ceed directly to the packages to efficiently process data. I feel very strongly about 
using these strategies. 

The most significant change in this edition is the addition of a chapter on hierar- 
chical linear modeling using HLM6. This material is important because correlated 
observations occur frequently in social science research and just a SMALL amount 
of dependence causes the type I error rate to be several times greater than one 
wishes! Since HLM involves a series of regressions, this new chapter is placed af- 
ter the material on regression. The distinction between fixed and random factors is 
important, and so it is emphasized. The chapter on HLM was written by Dr. 
Natasha Beretvas of the University of Texas at Austin. I thank her very much for 
her contribution. 

The third edition features newer versions of SPSS (Release 12.0) and SAS (Re- 
lease 8.0). Much of the material on importing data into SAS or SPSS that previ- 
ously appeared in chapter 1 was deleted. Importing data into these two programs is 
now much easier so this material was no longer necessary. 

The exercises involve a mixture of numerical, conceptual and computer related 
problems. I have de-emphasized purely numerical exercises, for I agree entirely 
with Cobb (1987, p. 323) that, “computing rules are just the skin of our subject; it is 
focus that reveals the skeleton of fundamental concepts and connections that hold 


xi 
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the body of knowledge together.” Regarding exercises, it is important to note that 
there are 3 new exercises for each chapter. Answers are provided for half of the ex- 
ercises and an /nstructor's Solutions CD is available to adopters. A computer ex- 
ample of real data integrates many of the concepts. A CD containing all of the 
book's data sets is included in the back of the book. 

The reader should have a background of a one quarter course in statistics that 
covered at least the t tests for independent and dependent samples. 

I am very grateful to the reviewers of this text: Dale Berger of Claremont Grad- 
uate University, Michael Milburn of University of Massachusetts, Mary Lou 
Kerwin of Rowan University, Gordon Brooks of Ohio University, and Roderick 
Gillis of the University of Miami. I am also indebted to some individuals at my 
publisher. Larry Erlbaum continues to be very supportive. Debra Riegert was in- 
strumental in motivating me to write this third edition. 


Jim Stevens 


Introduction 


CONTENTS 

1.1 Focus and Overview of Topics 

1.2 | Some Basic Descriptive Statistics 

1.3 Summation Notation 

1.4  tTest for Independent Samples 

1.5  tTest for Dependent Samples 

1.6 Outliers 

1.7  SPSS and SAS Statistical Packages 

1.8 SPSS for Windows—Release 12.0 

19 Data Files 

1.10 Data Entry 

.11 Editing a Dataset 

12 Splitting and Merging Files 

13 Two Ways of Running Analyses оп SPSS 

.14 SPSS Output Navigator 

.15 SAS and SPSS Output for Correlations, Descriptives, and t Tests 
.16 Data Sets on Compact Disk 

Appendix Obtaining the Mean and Variance on the TI-30Xa Calculator 


1 
1 
1 
1 
1 
1 


1.1 FOCUS AND OVERVIEW OF TOPICS 


This book has been written for applied social science researchers at the advanced 
undergraduate or beginning graduate level. It is assumed that you have had a one 
quarter course in beginning statistics that covered measures of central tendency, 
measures of variability, standard scores (z, T, stanines, etc.), correlation, and infer- 
ential statistics, including at least the f tests for independent and dependent sam- 
ples. In the next four sections of this chapter, we review briefly some descriptive 
statistics, summation notation, and testing for a "significant" difference. These 
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sections are not intended to thoroughly teach this material again, but to refresh 
your memory. 

The emphasis in the book is on conceptual understanding of the statistical tech- 
niques, learning how to effectively use statistical software to run the analyses, and 
learning how to interpret the computer printout that results from such runs. The 
two major statistical packages, SAS (Statistical Analysis System) and SPSS (Sta- 
tistical Package for the Social Sciences), are an integral part of this book. Details 
on SAS and SPSS are given in Section 1.7. I have attempted to make the text as 
practical as possible. To accent the practical emphasis, nine real data sets have 
been provided in Appendix A in the back of the book. For convenience, these data 
sets are also available on a CD. Some of the exercises in the chapters involve run- 
ning these data sets, or a part of a real data set. Singer and Willett (1988) have pro- 
vided an excellent annotated bibliography, indicating where numerous other real 
data sets may be found. 

The instructional mix of strategies adopted to illustrate each statistical tech- 
nique involves two parts: 


1. First, we illustrate each technique using definitional formulas on small data 
sets. These formulas are useful in yielding conceptual insight into what is being 
measured or quantified. As a simple example, the definitional formula for sample 
variance is 


52 = [G3 — x)? + (x2 — xY + + (Xn — Х)?]/(п—1) 


This formula shows very clearly that variance is measuring how much the 
scores for the subjects scatter or disperse about the mean. 

2. Then we move directly to the computer, that is, to the statistical packages, 
to show how to efficiently process data. And more importantly, how to interpret the 
printout from the packages. In practice, analyses will very likely be run on one or 
more of these packages, and thus it is important to become familiar with them. 


Now we give an overview of the topics in the book. The reader may recall 
that the t test for independent samples is appropriate for comparing two groups 
to determine whether they differ on the average on a dependent variable. But 
what if we wish to compare more than two groups simultaneously on a depend- 
ent variable? For example, we wish to compare the effect of four counseling 
methods on attitude toward education. Then a statistical procedure called analy- 
sis of variance is needed. This technique is covered in Chapter 2. Suppose that 
for this example there was reason to believe that the sex of the subjects might 
moderate the effect of the counseling methods, and we wanted to check this pos- 
sibility. This would lead us to a more complicated analysis of variance design, 
since we are examining the effect of two independent variables (sex and counsel- 


INTRODUCTION 3 


ing method) on attitude toward education. It is an example of a factorial design. 
These designs are covered in Chapter 4. 

Chapter 3 deals with power analysis. The power of a statistical test is the proba- 
bility of rejecting the null hypothesis when it is false. Although it may seem obvi- 
ous that we would want to achieve this, many researchers in the literature have 
failed to do so, as Cohen (1969) and others have pointed out. The reason is that 
power is generally inadequate with small group sizes (especially with 20 or less 
subjects per group), and in some areas of research such sample sizes are quite com- 
mon for pragmatic or other reasons. Chapter 3 provides a detailed and practical ap- 
proach to estimating the power of completed studies and also for estimating sam- 
ple size required for adequate power in an upcoming study. 

In Chapter 5 we treat the class of situations in which the same subjects are mea- 
sured more than twice on a dependent variable. For example, suppose a dietitian 
wishes to assess the immediate and long term effects of a behavior modification 
approach on weight loss for a group of overweight men. She measures the weight 
loss immediately following treatment and then 7 additional times (in three month 
intervals) over a two year period. The appropriate statistical analysis here is a dif- 
ferent type of analysis of variance from that in Chapter 2, called repeated measures 
analysis. The simplest case of a repeated measures design measures the subjects 
just twice, e.g., pretest—treatment—posttest. The investigator is interested in test- 
ing for a significant gain or change on the dependent variable, and the appropriate 
test is the ¢ test for correlated (dependent) samples that you studied in beginning 
Statistics. 

Chapter 6, which is a new addition to my intermediate text, deals with multiple 
regression. Much of the material is taken from my multivariate text (Stevens, 
1996). Multiple regression is a much used and abused technique. One of the prob- 
lems is that many researchers use multiple regression without validating their re- 
sults on an independent sample of data. I have made validating the model a major 
theme in this chapter. 

Analysis of covariance is now found in Chapter 7. This technique combines 
analysis of variance and regression analysis. Because of this, and because of the 
suggestions of two reviewers of this edition, I have covariance after regression and 
ANOVA. A covariate is a variable that is significantly correlated with the depend- 
ent variable. Analysis of covariance can be quite helpful in randomized studies, 
that is, studies where the subjects have been randomly assigned to the treatments, 
in increasing the sensitivity (power) of an experiment. 

Analysis of variance procedures and multiple regression are used very often in 
the literature. Thus it is important to learn this material in order to be able to intelli- 
gently and critically read the literature. 
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1.2 SOME BASIC DESCRIPTIVE STATISTICS 


The measure of central tendency that is used most frequently is the mean or aver- 
age for a set of scores. It is defined as 


х= (х +22 + +2X,)/n 


where n is the number of subjects and хі, is the score for subject 1 on variable x, 
хо is the score for subject 2, etc. The mean is an example of a summary statistic—it 
summarizes an important or salient feature of a set of data. For example, if you are 
told that the average weight for a pro football lineman is 280 pounds, or that the av- 
erage income of people living in a certain community is $80,000, each of these 
numbers packs a message. The average weight of 280 pounds indicates that the 
weights of most linemen tend to cluster around that value, and the income of 
$80,000 means that the incomes of most people in that community cluster around 
$80,000. These statements are accurate provided that there are no extreme values 
or outliers (see Section 1.6). 

Although the mean is useful in characterizing one important feature of a set of 
data, it can be misleading just by itself. To see this consider the following scores 
for three groups of 10 children each on a 20 item pretest in mathematics: 


Group 1 Group 2 Group 3 
10 10 10 
13 11 18 
7 11 2 
12 10 13 
13 12 17 
11 11 3 
8 11 8 
14 10 Із 
9 12 19 
12 11 4 

x, = 10.9 хо = 10.9 хз = 10.9 


On the average there is no difference between these three groups of children. 
However, there is a major difference among the groups in terms of variability of the 
scores about the mean. One can see intuitively that there is the least variability for 
group 2 (since the scores cluster very tightly about the mean of 10.9), while vari- 
ability is greatest for group 3. This differential amount of variability would have 
definite instructional implications, if you had to teach one of these three groups 
mathematics. Other things being equal, group 2 would be easier to teach since they 
are all at about the same level of ability. 
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To quantify the amount of variability in a set of scores we use the sample vari- 
ance 52 the definitional formula of which is 


2 — (x —¥)? + (о — Хх)? + + (ха — Хх)? 
п—1 


Notice that variance simply measures how much the scores vary about the 
mean. Now we find the variances for the three groups of children. Although the 
emphasis in this book is on using the computer for doing statistical analysis, there 
is a wide array of very inexpensive calculators that are conveniently used for calcu- 
lating the mean and variance for a set of data. In Appendix | at the end of this chap- 
ter we give the details for the TI-30Xa for the children in group 1. The variances 
for the three groups аге: 512 = 5.43, 52 = .54, and 53? = 41.43. 

Summary statistics like the mean and variance are especially useful in compar- 
ing different data sets (groups of subjects) on the same variable. Consider the fol- 
lowing two sets of scores, which represent the age of 25 automobile salesmen in 
the United States and 25 automobile salesmen in Western Europe: 


United States Western Europe 
23 63 25 22 32 43 26 30 27 40 
56 30 34 56 30 35 48 36 47 41 
25 48 44 27 26 34 45 30 38 33 
38 26 30 39 30 35 44 24 33 40 
36 32 36 38 33 31 23 29 97 28 


It is far from obvious by just looking at these sets how the ages for the two 
groups differ, if at all. Computation of the mean and variance for the groups yields: 
U.S. (x = 35.16, 52 = 117.22) and Western Europe (х = 35.08, 52 = 51.16). These 
statistics indicate that the average age is about the same for the groups and that the 
variability in age for the U.S. salesmen is over twice that for the Western Europe- 
ans. 


1.3 SUMMATION NOTATION 


The reader probably was exposed to the summation operator in an introductory sta- 
tistics course. Nevertheless, a brief review of some basic properties of У (sigma) 
will be helpful. The symbol У means “take the sum of.” Suppose we had measured 
50 subjects on anxiety. The sum of their scores is 


X] T Xo + X3 ... + X50 


This sum can be expressed concisely using X as follows: 
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50 
> ~ 
ігі 


The first term (хі) is obtained by setting i = 1, the second term (x2) by setting i= 
2, on down to the last term (x50) for і = 50. The quantity i is called the index of sum- 
mation; it is what we are summing on. Let us consider a few more examples to il- 
lustrate. Suppose we have measured 75 subjects on a variable y and wish to 
represen the sum of those scores using 2. Then it would look like this: 


75 
yr ya +: +75 = } у; 
i=l 


Or if we had 100 subjects measured on variable z and wish the sum of the scores 
for subjects 3 through 100, then we have 


100 
ает = z 
=> 


If the limits are understood, then they are dropped, and we would just write È z;. 
Note that the mean for a set of n scores can be written using >: 


x — (x1 + x» жут So xin 


Often we may wish to concisely represent a sum of squares of some type. Sup- 
pose we have n subject scores (xi, хә, ...Х,) and wish to denote the sum of the 
squared scores. This is 


х +5 ен = У)? 


The sample variance for a set of scores involves a sum of squares (squared devi- 
ations): 


s? =[(х — x E Q2 — x +: + On – x] - 1 


- jJ» -3y п 1) 


i=l 


or as 52 = X(x — xY /(n—1) if the limits are understood. 


Example 


Evaluate X x;2, where хі = 10, x2 = 8, x3 = 13, and x4 = 5. 
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Sox? = хр + х8 +3 +x} = 102 +82 +132 +52 = 358 


The following four properties of the summation operator are useful to know: 


Yet») = Xx+) 


1. summation sum 
of sum of summations 
Уа-у = Pjix-»» 
2. summation of difference 
difference in summations 


3. У ex; =c D Xi (a constant c can be moved across the summation) 


To show that property 3 holds note that 
Ж. = cxi d- cxo d cx, = c(t x» +: хь) = ву 4 


4. Ус = пс (summing over п subjects) 


The constant c mentioned in properties 3 and 4 can appear in many different 
subtle ways. To illustrate that and also to show how to apply several of the above 
properties, we will prove that the mean of a set of z scores is 0. 

Denote the z scores by zi, z2, . . ., Zn. Then by definition of a mean we have 


z=% z/n 


To show that z = 0 it suffices to show that Èz = 0. 
By definition z; = (xı — x)/s . Therefore by substitution: 


Уй == УО —x)/s 


Note that 1/s is a constant here; that is, it does not depend on i (index of summa- 
tion). By property 3 we can move it across the summation and write 


Ууш=(@1/5)) -x) 


Now by property 2 we can further rewrite this as 


У а -0/ sx — У 2] 
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Next, x is a constant and thus by property 4 we have that У x = nx . Also, 
since x = Xx; /n (by definition), this implies that X x = nx . Plugging these val- 
ues in we obtain 


Уй =(1/s)[nx —nx]=(1/s)-0=0 


1.4 t TEST FOR INDEPENDENT SAMPLES 


As an example we consider a study by air force psychologists conducting research 
into the relative effectiveness of training pilots. The first method makes use of 
computer simulated flight while the second uses traditional flight instruction. The 
18 subjects were randomly assigned to the two methods and the following perfor- 
mance test scores were obtained: 


Computer Simulation Flight 
2 1 
5 1 
5 2 
6 3 
6 3 
7 4 
8 5 
9 7 

7 
8 


We wish to test at the о = .05 level of significance whether the average perfor- 
mance for the two groups is different. Recall that we wish to test the null hypothe- 
sis (Ho) that the population means are equal: 


Ho: ш = Иг 


It is called the null hypothesis because saying the population means are equal is 
equivalent to saying that the difference in the means is 0, i.e., Ui—L2 = 0, or that the 
difference is null. 

Remember that level of significance is our probability of making a type I error. 
Type I error is the probability of rejecting the null hypothesis when it is true, or say- 
ing the groups differ when they don’t. This type of error can not be eliminated; 
however, we can and do control the risk by setting & = .05 or .01. Then there is only 
a 5% or 1% chance of making this type of error. 

It should be recalled that the ż test is based on the following three assumptions: 
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1. Normality—the scores on the dependent variable are normally distributed 
in each group. 

2. Homogeneity of variance—the population variances are equal for the two 
groups. 

3. Independence of the observations—each subject’s score on the dependent 
variable is not affected by other subjects in the same treatment group. 


Briefly, considerable research has shown that a violation of the normality as- 
sumption is of little consequence. Unequal variances will distort the type I error 
rate appreciably only if the group sizes are sharply unequal (largest/smallest > 1.5). 
Finally, dependent observations have a very serious effect on type I error rate. We 
discuss violations of assumptions in considerable detail in Chapter 2. 

To test Ho we use the following t statistic: 

t=— 00 with (m + — 2)df а) 
(%0/т--1/т 


where sp? = [(п1—1)512 + (n2 1)s22]/(n1 + 12.2) is the pooled estimate of the assumed 
common population variance for the groups (the homogeneity of variance assump- 
tion). Now, 512 and 522 are the sample variances for groups 1 and 2, while n; and n2 
are the respective group sizes. This test statistic can be calculated relatively easily 
by obtaining the mean and variance for each group with the TI-30Xa or some other 
calculator. With the variances obtained, we find that 


5р2 = [(8—1)4.57 + (10-1)6.54]/(10 + 8-2) = 5.68 
Using Equation 1 we calculate 


6-41 19 _ 


= 1.68 
J5.680/8+1/10 143 


Recall that we decided to reject Ho only if the value of т obtained was very ип- 
likely (would occur only 5% of the time) under the assumption of equal population 
means. The sampling distribution of t values (under the null hypothesis of equal 
population means) for this case is shown on the following page. 

From the figure we can see that only 2.5% of the time will we obtain а 1 value 
greater than 2.12 or less than —2.12 if the null hypothesis is true. The 2.12 and 
-2.12 are called critical values because they are critical or pivotal points for our de- 
cision on Но. Note that the critical values define the critical regions, where rejec- 
tion of Ho occurs. In general, if the value of t is greater than (in absolute value) the 
critical value, we will reject Ho; otherwise, we fail to reject. In this case since t — 
1.68 is not in the critical region, we fail to reject Ho. 
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t (under H,) for 16 of 


-2.12 0 1.68 2.12 


E = > 
reject Н ——_____________ Та! to reject H ——————— — — ——»- reject Н, 


The null hypothesis could have also been tested using a confidence interval. 
Confidence intervals are an important part of inferential statistics. The confidence 
interval will give us a range of values within which the population mean difference 
lies with a certain probability (or confidence). For the above f test for independent 
samples the confidence interval is given by: 

(x, —X2)—t .05;df Su -x, < Ш — H2 < (х — x2) +1.05; df Sx, —x5 
where t .05;df denotes the two tailed critical value at .05 with (пі + 12-2) degrees of 
freedom and ѕ х, -у, is the denominator of the ¢ statistic. Thus, the 95% confidence 
interval for the above problem is given by 


1.9 - 2.12 (1.13) «ui- uo < 1.9 + 2.12 (1.13) 
– 496 < u1- u2 < 4.296 


Since this interval covers (crosses) 0, this means 0 is a possible value for Ш — 
Мә, which means it is likely that [ij — u2 = О or that и = u2. Since it is possible that 
the population means are equal we would not reject the null hypothesis. On the 
other hand, if the confidence interval does not cross 0 then we conclude there is a 
significant difference between the groups, because this would mean 0 is not a pos- 
sible value for the population mean difference. Confidence intervals are more in- 
formative than a test of significance because they not only test the null hypothesis 
but also give us a range of values that is useful in judging the practical significance 
of results. We discuss the practical significance of results more in Chapter 2 on one 
way analysis of variance. 
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1.5 tTEST FOR DEPENDENT SAMPLES 


The 1 test for dependent samples is appropriate in a variety of situations, of which 
the following three are common: 


a. Pretest-treatment-posttest. 

b. Two groups of matched or paired subjects are compared on some depend- 
ent variable. For example, 16 girl beginners are matched on SES, LQ., 
number of children in the family, and general health. Eight of the girls had 
attended kindergarten; the other 8 had not. We wish to determine whether 
they differ on a test of first grade readiness. 

c. Weare comparing naturally occurring correlated pairs, such as twins, hus- 
band and wife, parent and child, etc. 


Our numerical example does not fit into the above three categories. 


Example 


A political candidate wishes to determine if endorsing increased social spending is 
likely to affect her standing in the polls. She has access to data on the popularity of 
several other candidates who have endorsed social spending. The data were avail- 
able both before and after the candidates announced their positions on the issue, as 
follows: 


Popularity 

Candidate Before After di 
1 42 43 1 
2 41 45 4 
3 50 56 6 
4 52 54 2 
Э 58 65 7 
6 32 29 -3 
7 39 46 7 
8 42 48 6 
9 48 47 -1 
10 47 53 6 


The 4; are the difference scores in popularity and are fundamental in defining 
the test statistic for correlated samples: 


d 
„with (n—1) d, (2) 
"Y Kn with (n —1) df 


t= 
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where d is the average difference score, 54 is the standard deviation for the differ- 
ence scores and n is the number of subjects or matched pairs. By using the 
TI-30Xa calculator on the difference scores above one obtains d = 3.5 and за = 

3.57. 35 
Therefore, t is calculated as t = ————— 
3.57//10 


.05 level is 1 .95:9 = 2.262. Since the value of the test statistic is greater than the criti- 
cal value, we reject and conclude that mean popularity after endorsement is greater 
than the mean popularity before endorsement. 

Note that the mean difference is equal to the difference in the means, as we 
show below, where x, and x, denote the scores after and before: 


d=) din=> (xa – љ)/п= у x In- Mix Inox, — Xp 


If we rewrite the equation for the f test for independent samples as 


— 3.097. The critical value at the 


х= 
SpVl/n, +1/n, 


then by placing this side by side with the ¢ test for dependent samples we сап see 
that they are structurally identical: 


t= 


Independent t Dependent t 
іе х = X2 _ Xa — Xp 
SpJl/n, +1/n, spVl/n 


The numerator in each case involves an estimate of the difference in the means; 
in the first case for the two groups and in the second case for the matched pairs or 
subjects on two different occasions. In the denominators, sp and s; provide esti- 
mates of the amount of sampling error for each mean difference. 


1.6. OUTLIERS 


An outlier is a data point which splits off or is very different from the rest of the 
data. Specific examples of outliers would be an I.Q. of 160 (among normal sub- 
jects), or a weight of 350 Ibs. in a normal population of subjects. It is very impor- 
tant to detect outliers because they can have a dramatic effect on the results of any 
statistical analysis. 

Outliers can occur because of two fundamental reasons: 


1. a data recording or entry error was made, or 
2. the subjects are different from the rest. 
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The first type of outlier can be identified by always listing the data and checking 
to make sure the data has been read in accurately. Consider the following small 
data set with two variables: 


ХІ X2 Zscore(X1) Zscore(X2) 
1 101.00 68.00 -.25078 53882 
2 92.00 46.00 -.77566 -.94293 
3 90.00 50.00 -.89230 -.67352 
4 107.00 59.00 09914 -.06735 
5 98.00 50.00 -.42574 -.67352 
6 150.00 66.00 2.60691 40411 
7 108.00 54.00 15746 -.40411 
8 110.00 51.00 27410 -.60617 
9 103.00 59.00 -.13414 -.06735 
10 94.00 97.00 -.65902 2.49202 
Total М 10 10 10 10 


Do you see апу outlier(s) for хі, or x2? Subject 6 is an outlier for х1; notice how 
150 splits off dramatically from the rest of the scores, which fall in the range from 
90 to 110. Subject 10 is an outlier for x2, since the score of 97 splits off sharply 
from the rest of the scores, which fall mostly in the range from about 50 to the mid 
60s. 

The z scores make the outliers quite apparent, and the z scores are very high 
since Shiffler (1988) has shown that the largest possible z score in a sample of size 
10 is 2.846. We elaborate on this later on in the section. 

You actually encountered the notion of an outlier in a beginning statistics 
course, although it may not have been called that by the instructor. In discussing 
measures of central tendency, your instructor probably indicated that whenever 
you have extreme scores in a set of data, the median should be used to characterize 
the data, rather than the mean. Extreme scores are called outliers here. The reason 
you were told to use the median is that it is essentially unaffected by extreme 
scores whereas the mean is drastically affected. Consider the following set of data: 
2, 3, 5, 6, 44. The last number is an outlier. If we were to use the mean (12), it 
would be quite misleading in characterizing the data set, as there are no scores 
around 12. The median, on the other hand, is 5 and does indicate where most of the 
scores lie (although there are only 5 of them). 

To show the dramatic effect an outlier can have on a correlation, consider the 
two scatterplots in Figure 1.1. Notice how inclusion of the outlier in each case 
drastically changes the interpretation of the results. For Case A there is no relation- 
ship without the outlier but there is a strong relationship with the outlier, while for 
Case B the relationship changes from strong (without outlier) to weak. 
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FIGURE 1.1 Тһе Effect of an Outlier on a Correlation Coefficient. 
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From the above it should be clear that it is very important to identify outliers and 
then decide what to do about them. Why? Because we want our analysis results re- 
flecting most of the data, and not being unduly influenced by just 1 or 2 errant 
points. 


Detecting Outliers 


If the variable is approximately normally distributed, then z scores around 3 in ab- 
solute value should be considered as potential outliers. Why? Because in an ap- 
proximate normal distribution about 99% of the scores should lie within three stan- 
dard deviations of the mean. Therefore, any z value > 3 indicates a value very 
unlikely to occur. Of course, if п is large (say > 100), then simply by chance we 
might expect a few subjects to have z scores > 3 and this should be kept in mind. 
However, even for any type of distribution the above rule is reasonable, although 
we might consider extending the rule to z > 4. It was shown many years ago that re- 
gardless of how the data are distributed the percentage of observations that are, 
contained within k standard deviations of the mean must be at least (1 — 1/2). 
100%. The above holds only for k > 1. 

Shiffler (1988) has shown that the largest possible value z value in a data set of 
size n is bounded by (n— Dn . This means for n = 10 the largest possible z is 
2.846 and for n = 11 the largest possible z is 3.015. Thus, for small sample size any 
data point with a z around 2.5 should be seriously considered as a possible outlier. 


1.7 SPSS AND SAS STATISTICAL PACKAGES 


The Statistical Analysis System (SAS) and the Statistical Package for the Social 
Sciences (SPSS) were selected for use in this text for several reasons. 


1. They are very widely distributed. 

2. They are easy to use. 

3. They do a very wide range of analyses—from simple descriptive statistics 
to various analysis of variance designs to complex multivariate analyses. 

4. They are well documented, having been in development for over two de- 
cades. 


The control language that is used by both packages is quite natural, and you will 
see that with a little practice complex analyses are run quite easily, and with a small 
set of control line instructions. A major change from the previous edition of this 
text is the advent of Windows and running analyses by simply clicking a series of 
buttons. It is assumed that the reader will be either running a Windows version of 
one or both of these packages on a desktop computer, or perhaps a notebook com- 
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puter, or running the analyses from the program editor (called this in SAS, or from 

the syntax editor, as called by SPSS). We illustrate the SPSS for Windows 12.0 in 

some detail. Examples are considered where the data is part of the control lines. 
Structurally, an SAS program is composed of three fundamental blocks: 


1. Statements setting up the data. 

2. The data lines. 

3. Procedure (PROC) statements—procedures are SAS computer programs 
which read the data and do various statistical analyses. 


To illustrate how to set up the control lines, suppose we wish to compute the 
correlations between locus of control, achievement motivation, and achievement 
in language for a hypothetical set of 9 subjects. First we create a data set and give it 
а пате. The name must begin with a letter and be 8 or less characters. Let us call 
the data set LOCUS. Now, each SAS statement must end with a semicolon. So our 
first SAS line looks like this 


DATA LOCUS; 


The next statement needed is called an INPUT statement. This is where we give 
names for our variables and indicate the format of the data (1.e., how the data is ar- 
ranged on each line). We will use what is called free format. With this format the 
scores for each variable do not have to be in specific columns. However, at least 
one blank column must separate the score for each variable from the next variable. 
Furthermore, we will put in our INPUT statement the following symbols @ @. In 
SAS this set of symbols allows you to put the data for more than one subject on the 
same line. 

In SAS, as with the other packages, there are certain rules for variable names. 
Each variable name must begin with a letter and be 8 or less characters. The vari- 
able name can contain numbers, but not special characters or an imbedded 
blank(s). For example, I.Q., x1 + х2, and SOC CLAS, are not valid variable names. 
We have special characters in the first two names (periods in І.О. and the + in x1 + 
x2) and there 1s an embedded blank in the abbreviation for social class. 

Our INPUT statement is as follows: 


INPUT LOCUS ACHMOT ACHLANG 88; 


Following the INPUT statement there is a LINES statement, which tells SAS 
that the data is to follow. Thus, the first three statements here setting up the data 
look like this: 


DATA LOCUS; 
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INPUT LOCUS АСНМОТ ACHLANG @@; 
LINES; 


Recall that the next structural part of a SAS program is the set of data lines. Re- 
member there are dime variables, so we have three scores for each subject. We will 
put the scores for three subjects on each data line. Adding the data lines to the 
above three statements, we now have the following part of the SAS program: 


DATA LOCUS; 

INPUT LOCUS ACHMOT ACHLANG 86; 
LINES; 

11 23 31 13 25 38 21 28 29 

21 34 28 14 36 37 29 20 37 
1724 3919 3039 23 28 41 


The first 3 scores (11, 23, and 31) are the scores on locus of control, achieve- 
ment motivation, and achievement in language for the first subject; the next 3 num- 
bers (13, 25, and 38) are the scores on these variables for subject 2; etc. 

Now we come to the last structural part of a SAS program, calling up some SAS 
procedure(s) to do whatever statistical analysis(es) we desire. In this case we want 
correlations, and the SAS procedure for that is called CORR. Also, as mentioned 
earlier, we should always print the data. For this we use PROC PRINT. Adding 
these lines we get our complete SAS program: 


DATA LOCUS; 

INPUT LOCUS ACHMOT ACHLANG 86; 
LINES; 

11 23 31 13 25 38 21 28 29 

21 34 28 14 36 37 29 20 37 
1724 3919 3039 23 28 41 
PROC CORR; 

PROC PRINT; 


Note that there is a semicolon at the end of each statement, but not for the data 
lines. 

In Table 1.1 we present some of the basic rules of the control language for SAS, 
and in Table 1.2 give the complete SAS control lines for obtaining descriptive sta- 
tistics, for obtaining a set of correlations (this is the example we just went over in 
detail), and for obtaining both the independent and dependent samples f tests. Al- 
though the rules are basic, they are important. For example, failing to end a state- 
ment in SAS with a semicolon or using a variable name longer than 8 characters 
will cause the program to terminate. The four sets of control lines in Table 1.2 
show the structural similarity of the control line flow for different types of analy- 
ses. Notice in each case we start with the DATA statement, then an INPUT state- 


TABLE 1.1 
Some Basic Elements of the SAS Control Language 


Non-columned oriented. Columns only become relevant when using column input for the data. 
SAS statements give instructions. Each statement must end with a semicolon. 


Structurally a SAS program is composed of three fundamental blocks: (1) statements setting up the 
data, (2) the data lines and (3) procedure (PROC) statements—procedures are SAS computer 
programs which read the data and do various statistical analyses. 


DATA SETUP 


First there is the DATA statement, where you are creating a data set. The name for the data set must 
begin with a letter and be 8 or less characters. 

Then there is the INPUT statement, where the variables are named and the format of the data is 
specified. 

Variable names must be 8 or less characters, must begin with a letter, and cannot contain special 


characters. 


We can use column input, where we indicate what column(s) the score for a variable is. If the 
variable is non-numeric then we need to put a $ after the variable name. 


Example 


Suppose we have a group of subjects measured on IQ, attitude toward education and grade point 
average (GPA), and will label them as М for male and F for female. SEX $ 110 3-5 ATTITUDE 
7-8 GPA 10-12.2 


This tells SAS that sex (M or F) is in column 1, IQ is in columns 3 through 5, attitude in columns 7 
to 8, and grade point average in columns 10 to 12. The .2 is to insert a decimal point before the last 
two digits. 


If we are using free format then the scores for the variables do not have to be in specific columns, 
they simply need to be separated from each other by at least one blank. 


The LINES statement follows the DATA and INPUT statements and precedes the data lines 


More than one statement can go on the same line, although for readability we recommend putting 
statements on separate lines. If we wish to do analysis on only some of the variables in the INPUT 
statement, then this is indicated in a VAR (abbreviation for variable) statement. For example, if we 
had 6 variables on the INPUT statement (X1 X2 X3 X4 X5 X6) and only wished to compute 
correlations for the first 3, then we would insert VAR X1 X2 X3; after the PROC CORR statement. 


Statistics for subgroups of subjects are obtained with the BY statement. Suppose we want the 
correlations for males and females on variables X, Y and Z. If the subjects have not been sorted on 
sex, then we sort them first using PROC SORT, and the control lines are 


PROC CORR; 
PROC SORT; 
BY SEX; 
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TABLE 1.2 


SAS Control Lines for Obtaining Set of Correlations, Descriptive Statistics, 
and Independent and Dependent T Tests 


CORRELATIONS 

(D DATA LOCUS; 
@ INPUT LOCUS ACHMOT ACHLANG @@; 
®© LINES; 

112331 1325 3821 28 29 

21 34 28 14 36 37 29 20 27 

17 24 39 19 30 39 23 28 41 

PROC CORR; 
&D PROC PRINT; 


MEANS AND STANDARD DEVIATIONS 


DATA MEANS; 
INPUT DRINK $ TACTUAL @@; 
LINES; 
A34A26A 18А 26A9 
A 28 А 14 A33 A 43 A 50 
NA 15 МА 2 NA 23 NA 7 МА 18 
NA 13 NA 9 NA 23 NA 8 NA 16 
PROC MEANS; 

© BY DRINK; 


(D Here we are giving a name to the data set. Remember it must be eight or less letters and must begin with a 


T TEST 
DATA ATTITUDE; 
© INPUT TREAT $ АТТ ва; 
LINES; 
C 82 С 95 С 89 99 C 87 
C 79 C 98 C 86 
T 94 T 97 T 98 T 93 T 96 
T 99 T 88 T 92 T 94 T 95 
T 92 T 97 T 96 T 90 T 89 
PROC TTEST; 
(6 CLASS TREAT; 
PROC PRINT; 
DEPENDENT SAMPLES T TEST 
DATA COFFEE; 
INPUT PRODWCB PRODWITH @ @; 
DIFF = PRODWITH-PRODWCB; 
LINES; 
23 28 35 38 29 29 
33 37 43 42 32 30 
@ PROC MEANS М MEAN T PRT; 
VAR DIFF; 


letter. Note that there is a semicolon at the end of the line, and at the end of every line for all 4 examples (except 


for the data lines). 


@ Note that the names for the variables all begin with a letter and are less than or equal to 8 characters. The 


double @ @ is needed in order to put the data for more than one subject on the same data line; here we have data 


for 3 subjects on each line. 


(8) When the data is part of the control lines, as here, then this LINES command always precedes the data. 
6) PROC (short for procedure) CORR yields the correlations, and PROC PRINT gives a listing of the data. 
(6) Тһе $ after TREAT is used to denote a non-numeric variable; note in the data lines that TREAT is either 


C(control) or 7(treatment). 


© We call up the t test procedure and tell it that TREAT is the grouping variable. 


© This BY statement yields means and standard deviations for each of the subgroups defined by DRINK 


(alcoholics & non-alcoholics) 


We create the difference variable (DIFF) on which the analysis is done. 


Ө) Procedure MEANS is used, with MEAN, 7, and PRT yielding the mean on the difference variable, 1 is test 


statistic and PRT is the tail probability. 
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ment (naming the variables being read in and describing the format of the data), 
and then the LINES statement preceding the data. Then, after the data, one or more 
PROC statements are used to perform the wanted statistical analysis, or to print the 
data (PROC PRINT). 

These 4 sets of control lines serve as useful models for running analyses of the 
same type, where only the variable names change and/or the names and number of 
variables change. For example, suppose you want all correlations on 5 attitudinal 
variables (call them x1, x2, x3, x4, and x5). Then the control lines are: 


DATA ATTITUDE; 
INPUT X1 X2 X3 X4 X5 (06, 
LINES; 


DATA LINES 


PROC CORR; 
PROC PRINT; 


where the data lines have just been indicated schematically. 

In Table 1.3 we present some of the basic elements of the SPSS control lan- 
guage, and in Table 1.4 give the complete SPSS control lines for descriptive statis- 
tics, correlations, and the f tests for independent and dependent samples. Some of 
the common errors committed in running SPSS programs are (1) using invalid 
variable names, (2) failing to indent for a subcommand, and (3) not starting a com- 
mand in column 1. 

It should be understood that although we give some important basic elements of 
the packages in Tables 1.1 and 1.3, and present complete control lines for various 
types of analyses in this text, our treatment is in no sense a substitute for the SAS 
and SPSS manuals. All the contingencies one might encounter in a practical prob- 
lem can't be covered in this text. One final important point before we leave the 
packages: The examples in this book were run on SPSS for Windows 12.0 or SAS. 
Itis possible if you are running a different release of SPSS or SAS that things may 
change a bit. Your instructor can help you with this. 


1.8 SPSS FOR WINDOWS—RELEASE 12.0 


A fantastic bargain, in my opinion, is the SPSS GRADUATE PACK FOR 
WINDOWS 12.0, which comes on a compact disk and sells at a university for stu- 
dents for about $190. It is important to understand that you are getting the full 
package, not a student version. For this release, as they note, Windows 98/2000 
Professional or NT 4.0 Workstation, ME and XP are required, along with 128 MB 
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TABLE 1.3 
Some Basic Elements of the SPSS Control Language 


SPSS operates on commands and subcommands 


It is column oriented to the extent that each command begins in column 1 and continues for as 
many lines as needed. All continuation lines are indented at least one column. 


Examples of Commands: TITLE, DATA LIST, BEGIN DATA 
The title can be put in apostrophes, and can be up to 60 characters. 


All subcommands begin with a keyword followed by an equals sign, then the specifications, and are 
terminated by a slash. 


Each subcommand is indented at least one column. 

The subcommands are further specifications for the commands. 

For example, if the command is DATA LIST, then 

DATA LIST FREE involves the subcommand FREE which indicates the data will be in free format. 
Names for variables must be eight or less characters. 

They must begin with a letter, or one of the following characters: @, # or $. 


FREE format—the variables must be in the same order for each case but do not have to be in the 
same location. Also, multiple cases can go on the same line, with the values for the variables 
separated by blanks or commas. 


When the data is part of the command file, then the BEGIN DATA command precedes the data and 
the END DATA follows the last line of data. 


The LIST command can be used to list the data. 


We can use the keyword TO in specifying a set of consecutive variables, rather than listing all the 
variables. For example, if we had the six variables X1,X2,X3,X4,X5,X6, the following 
subcommand are equivalent: 


VARIABLES = X1,X2,X3,X4,X5,X6/ or VARIABLES = X1 TO X6/ 


of RAM and 200 MB of hard disk space. If you have purchased a computer in the 
last 3 years these requirements should not be a problem. Statistical analysis is done 
on data, so getting data into SPSS or SAS is crucial. We discuss this next. 


1.9 DATA FILES 


As noted in the SPSS BASEI2.0 USER'S GUIDE (2003, p.19), “Data files come 
in a variety of formats, and this software is designed to handle many of them, in- 
cluding: 


SPSS Control Lines for Obtaining a Set of Correlations Descriptive 


TABLE 1.4 


Statistics, and Independent and Dependent T Tests 


CORRELATIONS 


TITLE ‘CORRELATIONS FOR 3 VARIABLES’. 
Ф DATA LIST FREE/LOCUS АСНМОТ ACHLANG. 


@ BEGIN DATA. 
11 23 31 13 25 38 21 28 29 
11 34 28 14 36 37 29 20 37 
17 24 39 19 30 39 23 28 41 
END DATA. 


@ CORRELATIONS VARIABLES = LOCUS 
ACHMOT 


ACHLANG/ 
PRINT = TWOTAIL/ 
® STATISTICS = DESCRIPTIVES/. 


MEANS AND STANDARD DEVIATIONS 
TITLE ‘DESCRIPTIVE STATISTICS’. 
DATA LIST FREE/DRINK TACTUAL. 
VALUE LABELS DRINK 1 ‘ALCOHOLIC’ 
2 ‘NON ALCOHOLIC’. 

BEGIN DATA. 
13412611812619 
128 114133143150 


Т TEST 

TITLE ‘T TEST’. 
DATA LIST FREE/TREAT ATT. 
BEGIN DATA. 

© 182195189199 
187179198 1 86 
294297298293 
2 96 2 99 2 88 2 92 
2 94 2 95 2 92 2 97 


2 96 2 90 2 89 
END DATA. 
© LIST. 


@ T-TEST GROUPS = TREAT (1,2) 
/ VARIABLES = ATT/. 


DEPENDENT SAMPLES T TEST 
TITLE ‘COFFEE BREAK’ 
DATA LIST FREE/PWO PWITH. 
BEGIN DATA. 

23 28 35 38 29 29 
33 37 43 42 32 30 
END DATA. 
(9 T-TEST PAIRS = PWO PWITH/. 


2152222327218 
2132922328216 
END DATA. 
MEANS TABLES = TACTUAL ВУ DRINK/. 


(D The FREE on this DATA LIST command is a further specification, indicating that the data will be in free 
format. 


(2) When the data is part of the command file, it is preceded by BEGIN DATA and terminated by END DATA. 

@ This VARIABLES subcommand specifies the variables to be analyzed. 

@ This yields the means and standard deviations for all variables. 

® This LIST command gives a listing of the data. 

© The first number for each pair is the group identification and the second is the score for the dependent variable. 
Thus, 82 is the score for the first subject in group 1 and 97 is the score for the second subject in group 2. 


© The t test procedure is called and the number of levels for the grouping variables is put in parentheses. 


The MEANS procedure calculates means and variances for a dependent variable(s) over subgroups defined by 
one or more classification variables. The TABLES subcommand is used to indicates for which variables the 
means and variances are desired. 


(9) The PAIRS subcommand names the variables being compared. 
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е Spreadsheets created with Lotus1—2—3 and Excel 

e Database files created with dBASE and various SQL formats 
• Tab delimited and other types of ASCII text files 

e Data files in SPSS format created on other operating systems 
• SYSTAT data files 

e SAS data files 
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As the screen below shows, one can easily import files of different types into 
SPSS for analysis: 


Look in: [СЭ 5Р5512 "| + © ex E3- 


(Cade Oia 
еп RE 

© es e ko 
СР (СЈроок5 
© Help С MapData 
(ЈЕ (Maps 


EE > 
File name: (нк та 
Flesoftype: [5 55Сзйў 7) Рае | 
~ 

Сапсе! ју 


We illustrate for an EXCEL and an SPSS file. As the above screen indicates, 
one needs to tell the software where the file is located and what type of file it is. If it 
is an EXCEL file ( stored in MY DOCUMENTS), then one would select MY 
DOCUMENTS and EXCEL for type of file. If itis an SPSS file, stored in SPSS12, 
then select that location and SPSS(*SAV) for type of file. We illustrate for an SPSS 
file, which we saved as INTERMCHI. The file is: 


SPSS FILE 


23 
25 
27 
29 
31 
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For the SPSS file the screen would be 


Look in: | © 5Р5512 М - ex E3- 


[css strain (ШЕ НА/58 19 

([E] Growth study |] Ноте sales [by neighborhood] 
[ 65593 subset Јев] пкеттећа 

SS 93 for Missing Values inventor 

guttman judges 

анато: Ёл det 


« | =) 

File name: LE el 

Files of type: [sssts) | | rj Pate | 
Cancel ) 


1.10 DATA ENTRY 


When SPSS is opened the data editor provides a spreadsheet like editor for creat- 
ing and editing data files. In this section we illustrate creating a data set within 
SPSS. The data set we create has 3 variables and 10 cases. In the editor cases are 
rows and variables are columns. The data editor is shown below: 


Untitled - SPSS Data Editor 
File Edit View Data Transform Analyze Graphs Utilities Add-ons Window Help 


пе | | 5] 8| |с] p| | 225] ЕБ] ӘӘ 
I 


var var var var var 


The first number we wish to enter is 12. Press the forward arrow key and you 
will move laterally to the next column. There you enter the value for the 2nd vari- 
able, i.e., 13.Press the forward arrow key again and enter 19. Now, you press TAB 
and the box will go automatically to the first position in the second row. Punch in 
23 and press the TAB key. Now, punch in 29 and press the TAB key again. Then en- 
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ter 72 and press the TAB key again. The box will go automatically to the first posi- 
tion in the third row.When you are done punching in all 10 cases, the screen looks 
as follows: 


datap41 - SPSS Data Editor 
File Edit View Data Transform Analyze Graphs Utilities Аф 


1: quality Ұ 


13.00 19.00. 

23.00 29.00) 72.00) 
29.00. 3800 11100) 

203600: ) 2800 
44.00) 20700 104.00) 
2100! 14.00 | 28 00 
40.00) 4400)  1600| 
4200 60.00! 57 00| 
24.00) 16.00) 18.00) 
30.00) 37.00| 4100! 


We have skipped a step here. Originally, the program assigns generic names to 
the variables. By switching to the VARIABLE VIEW we have given the above 
names to the variables. Switching back to the DATA VIEW we obtain the above re- 
sult. 


1.11 EDITING A DATASET 


Changing a Cell Value 


Suppose we wished to change the circled value to 23. Move to that cell. Enter the 
23 and press ENTER. The new value appears in the cell. It is as simple as that. 


Inserting a Case 


Suppose we wished to insert a case after the 7th subject. How would we do it? As 
the guide points out: 


1. Select any cell in the case (row) below the position where you want to insert 
the new case. 
2. From the menus choose: 
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DATA 
INSERT CASE 


A new row is inserted for the case and all variables receive the system-missing 
value. It would look as follows: 


Insert art from p. 40 of previous edition 


18 x 20 picas 


Suppose the new case we typed in was 35 17 63. 


Inserting a Variable 
Now we wish to add a variable after NFACULTY. How would we do it? 


1. Select any cell in the variable (column) to the right of the position where 
you want to insert the new variable. 
2. From the menus choose: 
DATA 
INSERT A VARIABLE 


When this is done, the data file in the editor looks as follows: 
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Insert art from p. 41 of previous edition 


21 x 20 picas 


Deleting a Case 


To delete a case is also simple. Click on the row (case) you wish to delete. The en- 
tire row is highlighted. From the menus choose: 


EDIT 
CLEAR 


The selected row (case) is deleted and the cases below it move it up. To illus- 
trate, suppose for the above data set we wished to delete case 4 (row 4). Click on 4 
and choose EDIT and CLEAR. The case is deleted, and we are back to 10 cases, as 
shown below: 
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22.5 x 20 picas 


1.12 SPLITTING AND MERGING FILES 


Split file analysis splits the data file into separate groups for analysis, based on the 
values of the grouping variable (there can be more than one). We will find this use- 
ful in Chapter 2 on assumptions when we wish to obtain the z scores within each 
group. To obtain a split file analysis, click on DATA and then on SPLIT FILE from 
the dropdown menu. Select the variable on which you wish to divide into groups 
and then select ORGANIZE OUTPUT BY GROUPS. 

Merging data files can be done in two different ways: (1) merging files with the 
same variables and different cases, and (2) merging files with the same cases but 
different variables. SPSS gives the following marketing example for the first case. 
For example, you might record the same information for customers in two different 
sales regions and maintain the data for each region in separate files. We will give an 
example to illustrate how one would merge files with the same variable and differ- 
ent cases. As the guide notes, open one of the data files. Then, from the menus 
choose: 


DATA 
MERGE FILES 
ADD CASES 
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Then select the data file to merge with the open data file. 


Example 


To illustrate the process of merging files, we consider two small, artificial data sets. 
We denote these data sets by MERGE] and MERGE, respectively, and they are 


shown below: 


Insert art from p. 43 of previous edition 


24 x 24 picas 


As indicated above, we open МЕКСЕІ and then select DATA and MERGE 
FILES and ADD CASES from the dropdown menus. When we open MERGE2 the 


ADD CASES window appears: 
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29.5 x 20 picase 


When you click on the OK the merged file appears, as given below: 
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21 x 18 picas 
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1.13 TWO WAYS OF RUNNING ANALYSES ON SPSS 


Point and Click 


Bring the data into the editor. Click on ANALYZE and scroll down to analysis de- 
sired. 


Syntax Editor 


* Click on FILE, NEW AND SYNTAX. A blank screen will appear. 
* Type in the syntax. 
e To run, click on RUN and then scroll down to ALL. 


To illustrate both methods of doing an analysis we use the t test data from Table 
1.4. 

For the point and click method we would first bring the data into the spreadsheet 
editor. Then we click on ANALYZE, scroll down to COMPARE MEANS, and 
across to INDEPENDENT SAMPLES T TEST. 

To use the syntax editor for analysis, we first need to get to the syntax editor. 

This we do with FILE—NEW—SYNTAX. A blank sheet appears. 

We simply type in the syntax; then click on RUN and then ALL. 

Both methods will, of course, yield the same results. 


1.14 SPSS OUTPUT NAVIGATOR 


The Output Navigator was introduced in SPSS for Windows (7.0) in 1996. It is 
very nice. A survey researcher is conducting a pilot study on a 12 item scale to 
check out possible ambiguous wording, whether any items are sensitive, whether 
they discriminate, etc. She administers the scale to 16 subjects. The items are 
scaled from 1 to 5, with 1 representing strongly agree and 5 representing strongly 
disagree. The first 8 subjects are male and the last 8 are female. There is some 
missing data, which is coded as a 0. She wishes to compare males and females on 3 
subtests (SUBTESTI, SUBTEST2, SUBTESTS), and also to determine the inter- 
nal consistency of these subtests. We will illustrate only some of the things that can 
be done with output for the above survey example. First, the entire command syn- 
tax for running the analysis is presented below: 


TITLE ‘SURVEY RESEARCH WITH MISSING DATA’. 
DATA LIST FREE/ID 11 12 13 14 15 16 17 18 19 110 111 112 SEX. 
BEGIN DATA. 
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11228352831 1 221221 
2122838331221 11 1 
3121 3383 23321231 
4224 233223383231 
5232421230340 1 
62328332 343242 1 
7 83844 8352212 334 1 
8323443 43 3 342 1 
93342433 45 3 5 3 2 
1044 5 533 54445 3 2 
114405554 305442 
12444554 33544 52 
1344043 25 13 3 0 4 2 
1455 34444 5 3 5 5 3 2 
15 5 5 4 5 355 445 3 5 2 
146543 43 5443 22 3 2 
END DATA. 

LIST. 


MISSING VALUES ALL (0). 

COMPUTE SUBTEST1 = 11412+13+14+15. 

COMPUTE SUBTEST2 = 16+17+18-19. 

COMPUTE SUBTESTS = 110+111+112. 

RELIABILITY VARIABLES = 11 ТО 112/ 
SCALE(SUBTEST1) = 11 TO I5/ 
SCALE(SUBTEST2) = І6 TO 19/ 
SCALE(SUBTESTS) = 110 111 112/ 
STATISTICS = СОВН/. 

T-TEST GROUPS = SEX(1,2)/ 
VARIABLES = SUBTEST1 SUBTEST2 SUBTEST3/. 


This is run from the command syntax window by clicking on RUN and then on 
ALL. The first thing you want to do is save the output. To do that click on FILE and 
then click on SAVE AS from the dropdown menu. Type in a name for the output 
(we will use MISSING), and then click on OK. The output, in the navigator, ap- 
pears as follows: 
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31 x 24 picas 


As shown above, the output is divided into two panes. The left pane gives in out- 
line form the analysis(ses) that have been run, and the right pane has the statistical 
contents. To print the entire output, simply click on FILE and then click on PRINT 
from the dropdown menu. Select how many copies you want and click on OK. It is 
also possible to print only part of the output. I will illustrate. Suppose we wished to 
print only the reliability part of the output. Click on that in the left part of the pane; 
itis highlighted (as shown in the figure below). Click on FILE and PRINT from the 
dropdown menu. Now, when the print window appears click on SELECTION and 
then OK. Only the reliability part of the output will be printed. 
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21 x 23 picas 


Insert art from p. 47 of previous edition (bottom half) 


28 x 16 picas 
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Insert art from p. 48 of previous edition 


It is also easy to move and delete output in the output navigator. Suppose for the 
missing data example we wished to move the corresponding to LIST to just above 
the 1 Test. We simply click on the LIST in the outline pane and drag it (holding the 
mouse down) to just above the 1 test and then release. 

To delete output is also easy. Suppose we wish to delete the LIST output. Click 
on LIST. To delete the output one can either hit DEL (delete) key on the keyboard, 
or click on EDIT and then click on DELETE from the dropdown menu. 

As mentioned at the beginning of this section, there are many, many other 
things one can do with output. 
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1.15 SAS AND SPSS OUTPUT FOR CORRELATIONS, 
DESCRIPTIVES, AND t TESTS 


In Table 1.5 we present SPSS Windows 12.0 printout for the correlations and SAS 
printout for the descriptives statistics. Table 1.6 presents SPSS for Windows 12.0 
screens for the ¢ test for independent samples. Table 1.7 has the SPSS for Windows 
12.0 printout for the independent samples t test and the SAS printout for the de- 
pendent samples f test. 


TABLE 1.5 
Correlations and Descriptive Statistics From SPSS 
for Windows and SAS 


Insert art from p. 49 of previous edition 


27 x 25 picas 
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TABLE 1.6 
SPSS Windows Screens for Running t Test for Independent Samples 


Insert art from p. 50 of previous edition 


30 x 19.5 


Insert art from p. 50 of previous edition 


?? X 14 picas 


TABLE 1.7 
t Test for Independent Samples From SPSS Windows and t Test 
for Correlated Samples From SAS 


Insert art from p. 51 of previous edition 


X 38.5 picas depth 
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1.16 DATA SETS ON COMPACT DISK 


There are 5 SPSS data files on the compact disk, and 4 ASCII (text) data files on 
the disk. To access the SPSS files change LOOK IN to the compact disk icon and 
FILE TYPE to SPSS(*SAV). When this is done, the screen will appear as follows: 


| agrestic 
Еісагісопі 
(Еј тоттва 
ТЕ пагасад 
|Е|СЕСАМЕ 


File name: a, 
Files of type: (БР55 (sav) - Paste 
Cancel 


To access the ASCII(text) files leave LOOK IN as the compact disk icon, but 
change FILE TYPE to TEXT. When this is done, the screen will look as follows: 


Open File 


623 050722_1933 (D) 


E) alcohol 
2] attitude 
E) dinical 
E) headache 


File name: | 
Hescope. ЕЕЕ e | 
Cancel | 
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When you double click on a SPSS file, the file will go right into the spreadsheet 
editor ready for analysis. For the ASCII files things are a bit more complicated. 
When one double clicks on an ASCII file, the TEXT WIZARD will appear. This is 
documented in SPSS BASE 12.0 USER'S GUIDE (2003, PP37-47 and SPSS 
BASE 13.0 USER'S GUIDE (2004, PP 39-49). This procedure can read a variety 
of files. For our purpose just press NEXT several times. In the final step (step 6) 
press FINNISH, and the data file will appear in the spreadsheet editor ready for 
analysis. 


EXERCISES 


1. An advertisement in the paper claims that the average pay at Smith Indus- 
tries, a small factory, is $22,000. You are currently making $15,000 and de- 
cide to apply for a job there. Subsequently you find out that most people at 
Smith also make $15,000, and you are upset about the ad. You later deter- 
mine that the salary structure at Smith Industries is as follows: 


50 workers $15,000 each 
Managers of the two divisions at Smith $35,000 each 
Two executives $70,000 each 
Owner $250,000 


(a) Why was the $22,000 figure in the paper so misleading? 
(b) Which measure(s) of central tendency should have been used to con- 
vey a more accurate picture of the salaries at Smith? 


2. Suppose Mr. Jones had administered the same math test to each of his two 
eighth grade classes, with the results shown below: 


Class 1 Class 2 
Size 20 40 
Mean 60 80 


Mr. Jones then naively computed the average for all students by taking the 
average of the above two means, yielding 70. 

(a) Intuitively, why is this not correct? 

(b) The correct formula for finding the combined mean is to use a 
weighted average: 


Xe = (MX, + n2x?)/(m +m) 


where n; and пә are the respective group sizes and x; and x5 аге the group 
means. Plugging the above numbers into this formula yields the correct 
overall mean of 73.33. 
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Now, let x1, x2, . . ., Xnı represent the subjects scores in group 1 and let x1, x2, 
.. 4 Хо represent the subjects scores in group 2. Prove that the formula for 
the combined mean is as given above. 

HINT: Start with the definition for the mean for all subjects combined: 
z= (хі + ха e+ ха) + dX» d Хаг) 


с 


№ n 


. An investigator runs a f test for independent samples on two groups of sub- 
jects (45 subjects in group 1 and 35 in group 2). She notes that the distribu- 
tions of scores are quite positively skewed in both groups. Should she be 
concerned about this? 


. (a) Suppose that in a hospital each patient's pulse is taken in the morning, 
at noon, and in the evening. For two patients, on a given day, the average 
pulse readings are both 74. The records for that day show the following: 


Morning Noon Evening Mean 
Patient A 72 76 74 74 
Patient B 72 91 59 74 


Are the clinical implications for these two patients the same? Explain. 

(b) You are a scout for the Boston Celtics professional basketball team and 
are looking for a guard. From scouting reports you focus in on two guards 
from Duke and UCLA, schools that play on a similar level of competition. 
It is noted that each guard averaged 20 points over all games in his senior 
year, so you decide to examine their performance game by game: 


UCLA guard: 21, 18, 19, 23, 25, 20, 22, 17, 23, 24 etc. 
Duke guard: 13, 35, 28, 11, 8, 40, 22, 31, 15, 29 etc. 


(a) Which guard might you prefer, and why? 


(b) What is the main point that each of the two parts of this exercise illus- 
trates? 


. (а) Suppose that c = 3, xı = 5, x2 = 8, хз = 1, and x4 = 7. Evaluate the fol- 
lowing: 


4 
У је 
1=1 


(b) Prove that the variance of а constant times a variable is equal to c? 
times the variance of x; that is, prove that 
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Sex = C SE 
Hint: The scores on cx may be represented as: схі, сло, . . ., сх. Apply the 
definitional formula for variance to this set of scores and mathematically 
rearrange. 


(c) Suppose there are 10 subjects in each of three groups. The means for 
the groups are хі = 4.1, х = 6.3 and x3 = 8.5. Evaluate the following: 
3 
У 1005 —xy ,where x is те grand mean for all the subjects. 
i=l 


. A team of researchers is comparing two diets, a behavior modification ap- 


proach and the Beverly Hills diet, in their effect on weight loss for a group 
of overweight women. Suppose that the data for the 10 subjects in each diet 
is as follows: 


Behavioral Modification Beverly Hills 
10 14 8 12 
15 197 16 10 
11 32 13 7 
22 15 4 15 
8 9 10 1 


Label the independent variable here DIET and the dependent variable 
WGTLOSS. Use column format, with group identification (1 or 2) in col- 
umn 1 and weight loss in columns 3 and 4. Show the complete SAS control 
lines for running the 7 test for independent samples on this data. 


7. Aresearcher has the following heights and weights on 14 men: 


Subject 


1 2 3 4 5 6 7 8 9 10 п 12 13 14 


Height 67 66 63 61 68 69 69 70 71 73 75 74 TI 71 
Weight 148 161 152 145 169 162 170 183 174 115 205 186 233 158 


The heights are given in inches. 

(a) Compute the correlation between height and weight. What conclu- 
sion would you draw? 

(b) Check the data to see if there is an outlier. 

(c) Compute the correlation without the outlier. Now, what do you con- 
clude? 


8. 


Before 
After 


10. 


11. 
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A worker in a neighborhood clinic wishes to asses the impact of showing 
an educational film on patient compliance in taking an antihypertension 
medication. The diastolic blood pressure is the dependent variable here. 
During the study the medication dosage is kept constant. Blood pressure 
was measured one week prior to the film, then the film was shown, and the 
blood pressure was measured again three weeks later. The data are: 


Patient 
1 2 3 4 5 6 7 8 9 10 
110 105 98 100 89 82 113 102 101 118 
100 95 88 92 83 86 100 101 96 112 


(a) Denote the diastolic blood pressure by DIASTOL and the treatment 
(independent variable) by FILM. Show the complete set of control lines for 
running the 1 test for dependent samples on SPSS to determine whether 
there was a significant change in the blood pressure. 


. Run at test for independent samples with TREAT as the grouping variable 


and attitude (ATT) as the dependent variable. Use either SAS (Table 1.2 ) 
or SPSS (Table 1.4). 


Run at test for dependent samples with SAS (Table 1.2) or SPSS (Table 
1.4). 


Consider the following small data set: 


X Y Z 
1 12 13 15 
2 11 9 8 
3 7 5 3 
4 2 4 6 
5 1 5 3 
6 8 9 7 


Edit this data set using SPSS or SAS in the following ways: 
(a) Change 13 to 25. 

(b) Insert a case after Case 3 

(с) Insert a variable after У. 


APPENDIX 
OBTAINING THE MEAN AND VARIANCE 
ON THE TI-30Xa CALCULATOR 


We consider the data for the children in group 1 (example in Section 1.2), which is 
as follows: 


= ow dg 
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10, 13, 7, 12, 13, 11, 8, 14, 9, 12 


TI-30Xa 
STEP DISPLAY 
Enter 10 and press £+ n=] (indicates 1 data point has been entered) 
Enter 13 and press £+ n=2 (indicates 2 data points entered) 
Enter 7 and press + n-3 


Enter the remaining 7 data points, and at that point the display will show n=10, indicating that 
10 data points have been entered. 


Press 2ND and then X? 10.9 (this is the mean) 
Press 2ND and then VX 2.33095 (standard deviation) 
Press X2 5.4333 (variance) 


One Way Analysis of Variance 
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CHAPTER 2 


2.1 INTRODUCTION 


One of the statistical tests encountered in introductory statistics courses is the г test 
for independent samples. This test is appropriate when comparing two groups of 
subjects on a single dependent variable. Three classical applications of this test are 
given below: 


к- 


Comparing a treatment group against a control group. 


. Comparing the relative efficacy of treatment 1 vs. treatment 2 
. Comparing two intact groups (such as males and females or two social 


classes) on some dependent variable. 


However, in many situations we may wish to compare more than two groups si- 
multaneously on a dependent variable. In these cases a different statistical tech- 
nique, called analysis of variance (ANOVA), is needed. We consider 7 examples 


below: 


. А counselor wishes to compare the effectiveness of two types of counsel- 


ing* (Rogerian and Adlerian) on changing the attitude of low achieving 
high school students toward school. She also has a control group in her 
study. Thus, there are 3 groups being compared on the dependent variable 
of attitude toward school. 

A psychologist wants to determine if five drugs have a differential effect on 
reaction time (dependent variable) for 100 subjects, 20 of which have been 
randomly assigned to each drug. 


. A dietician wishes to discover whether four diets produce differential 


weight loss (the dependent variable) for 80 overweight women. Here diets 
are the treatments, and we have four groups being compared. 

A researcher for a school district wishes to determine whether the reading 
achievement on a standardized test differs on the average for six elemen- 
tary schools in similar socioeconomic areas. Here we have six groups (the 
schools) being compared on reading achievement (the dependent variable). 


. A marketing researcher wishes to determine if shelf location has an effect 


on volume of sales of a product. If there are 4 shelf locations (from high to 
low), then we have a 4 group ANOVA, with sales as the dependent variable. 


*This chapter deals with what is called fixed effects ANOVA, since counseling methods, teaching 
methods, diets, etc. are generally not randomly sampled from some population of methods or diets. 
Thus, our inferences are fixed to the counseling methods, teaching methods or diets under consider- 
ation. Further elaboration of this point is given in Section 4.7, where we distinguish between fixed and 
random effects ANOVA. 
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6. We wish to determine whether violent crime varies in different regions of 
the country. If there are 7 different regions being compared, then we have a 
7 group ANOVA. 

7. Doughnuts absorb fat in various amounts when they are cooked. Suppose 
an experiment is set up involving three types of fat: peanut oil, corn oil, and 
lard. We would have а 3 group ANOVA, with amount of fat absorbed as the 
dependent variable. 


Let us review some of the basic terminology concerned with hypothesis test- 
ing before considering how to do an ANOVA. Recall that with the / test we 
talked about testing a null hypothesis versus some alternative hypothesis, which 
looked like this: 


Ho: ш = Иг (population means are equal) 
Hy: iz u2 


Why is it called the null hypothesis? Because to say that the population 
means аге equal is equivalent to saying that their difference is null, i.e., that ш — 
Иг = 0. Also, in testing the null hypothesis the notion of testing at some level of 
significance was encountered. What does it mean to do a t test at the о = .05 
level of significance? This means we are taking a 596 chance of rejecting the null 
hypothesis when it is true, that is, saying the groups differ when in fact they do 
not. Level of significance is also called the probability of making a type I error. 

Notice that the alternative hypothesis (Н) for the 7 test is very simple, since 
either the two groups are equal or they differ. In analysis of variance (for k 
groups) the alternative is much more complex. What is the null hypothesis for a 
one way analysis of variance with k groups? It is that the k population means are 
equal: 


Ho: Hi = W2 = |з =... = ш 


The alternative hypothesis here is more complicated than that for the 1 test. 
Let us consider the four group case to illustrate. If we reject the null hypothesis 
it could be for various reasons. It might be because only two groups are differ- 
ent, or it might be because only 3 groups are different, or because group 1 differs 
only from groups 3 and 4, or because all 4 groups are different, etc. How can we 
characterize all these possibilities into an alternative hypothesis? Notice that in 
all the above cases at least two of the groups differed. Thus, a way of stating the 
alternative hypothesis is as follows: 


Hı: At least two of the И; are different. 
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2.2 RATIONALE FOR ANOVA 


Now that we know what the null hypothesis is that is being tested in a one way 
ANOVA, we might ask the following questions, “Why bother doing an ANOVA 
when comparing k groups? Why not simply do several f tests?” To see why the lat- 
ter is problematic, let us consider the four group case (Tı T» Тз Та). There are six 
paired comparisons here (12, 13, 14, 23, 24, 34). We could do six г tests, each at the 
.05 level, to determine which of these pairs are significantly different. Now, for just 
one of these ¢ tests the œ level is under control at .05. But, for the set of 6 tests the о 
level gets out of control, since there is a 5% risk of false rejection for each test. We 
define the overall о level for a set of tests as the probability of at least one type I er- 
ror (false rejection) when Но is true. Now, it can be shown that if о is small, then 
overall о «rot, where ғ is the number of tests being done. Actually, го is an upper 
bound on the overall о level. Let us use this to see how rapidly the overall о inflates 
as the number of groups increases: 


Number of groups Number of t tests Approximate overall о. 
3 3 15 
4 6 30 
5 50 
6 75 


This table shows that if we were to compare five groups with 10 ż tests, each at 
the .05 level, then we have an approximate 50% chance of at least one false rejec- 
tion. Thus, the probability of a few false rejections here is uncomfortably high. For 
six groups and 15 1 tests, the probability of 2 or 3 false rejections is very likely (ар- 
proximately .75)! Thus, it should be clear that using multiple ¢ tests in a k group sit- 
uation is not the way to proceed. We see later on in the chapter (Section 2.15) that a 
much tighter upper bound than ro can be put on overall ©, especially when each 
test is done at the о = .05 level. 


2.3 NUMERICAL EXAMPLE 


The analysis of variance procedure, which is appropriate, was developed by R. A. 
Fisher in an agricultural context back in the 1920s. ANOVA is based on the follow- 
ing three assumptions: 


1. The observations are normally distributed on the dependent variable in 
each group. 

2. The population variances for the groups are equal. 

3. The observations in each group are independent. 
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We consider the effect of violations of these assumptions in detail in Section 
2.7. 

For our example, suppose a consumer organization wants to compare the price 
of a particular toy in three types of stores in a suburban county: variety stores, de- 
partment stores, and discount toy stores. A random sample of 3 variety stores, 4 de- 
partment stores, and 5 discount toy stores is selected and the following prices (in 
dollars) are recorded. We wish to test whether there is a difference in the average 
prices on this toy for the populations of stores from which these stores were se- 
lected. 

The null hypothesis that is being tested here is 


Ho: ш = ш = Из 


The sample means above are estimating the population means: 


X = W, = W2, X3 = Из 


Variety Dept. Discount 
3 4 4 
6 7 5 
8 9 2 
8 3 
5 
x, = 5.67 X» 27 Ха 23.8 


We wish to determine whether the sample means differ sufficiently, given sam- 
pling error, to suggest that the underlying population means differ. To determine 
this the ANOVA computes and compares two basic sources of variation: 


1. Between group variation—determines how much the group means vary 
about the grand (overall) mean. 

2. Within group variation—determines how much the subjects scores vary 
who are in the same group. Variation here is primarily due to individual dif- 
ferences. 


Between Group Variation 
The general formula here is given by 
k 
SS, = ЭП —х)? 


1=1 
SSp =n (x1 —X)2 +m (X2 — Хх)? =- + ng (Xp — X)2 (1) 
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where Х is the summation symbol, n; denotes the number of subjects in the ith 
group, x denotes the grand mean, and SS, stands for sum of squares between. It is a 
weighted sum of squares, where each deviation is weighted by the number of sub- 
jects in that group. 

For the above data this becomes 


SS, = Y nia — 5.33)2 
= n (% — 5.33)? +m (x; — 5.33)? +m (33 — 5.33)? 
= 3(5.67 — 5.33)? + 4(7 — 5.33)? + 5(3.8 — 5.33? 
= 3468 + 11.1556 + 11.7045 = 23.2069 


In calculating the grand (overall) mean above it is simplest to add up all the 
scores and divide by total number of subjects. Thus, in the above case this yields 
X = 64/12 = 533. Опе can also obtain the grand mean from the individual means 
with the following formula: 


Хе = (та + то + пх) М 


where т is the number of subjects in group 1, п2 is the number of subjects in 
group 2, etc., and N represents total number of subjects. Note that this is a weighted 
average and that means based on a larger number of subjects receive greater weight 
in determining the grand mean. Because of this it is not appropriate to find the 
grand mean with unequal group sizes by simply taking the average of the means— 
a mistake frequently made. 

We need the mean sum of square between (MSp), since this represents a vari- 
ance (we see why in Section 2.5). MS, is simply sum of squares between ($$) di- 
vided by degrees of freedom, i.e., 


MS, = 85 / (k – 1) = 23.2069 / (3 – 1) = 11.6035 (2) 


Within Group Variation 


Verbally, within group variability is calculated by deviating each score in group 1 
about the mean in group 1, and squaring and summing these deviations. We then 
square the deviations of all scores of group 2 about the mean for group 2 and sum 
them, and so forth on to the kth group. We then pool (add) these squared deviations 
to obtain the sum of squares within, denoted by SS,,. Symbolically then, it looks 
like 


55, = 3 (хп — 2) + У (хор — Y +--+ (ле — Хе) (3) 
1 2 k 
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where x;; is the score of the ith subject in group 1, xj? is the score of the ith subject 
in group 2, and xj, is the score of the ith subject in group k. Now we calculate SS,, 
for the above data: 


SS, = (3 — 5.67)? + (6 – 5.67)? + (8 — 5.67)? (variability within gp 1) 
+ (4 – 7)2 + (9 - 7 + (8 — 7} (variability withing gp 2) 
+ (4 – 3.8)? + (5 – 3.8? + (2 - 3.8? + (3 – 3.8)? + (5 - 3.8? 
(variability within gp 3) 


SS, = 33.4667 pooled within group variability. Once again we need MS,, (mean 
sum of squares within), which represents a variance, rather than SS,,. The formula 
for MSw is 


MS, = SSw / (N — k) = 33.4667 / (12 – 3) = 3.7185 (4) 


where N denotes the total number of subjects. 


The F Test 


To test the tenability of the null hypothesis the following F statistic is used: F = 
MSj/MS,,. Thus, for our data this is 


F-MS,/MS,-11.6035/3.7185 = 3.12 (5) 


To determine whether this is large enough to reject Ho we must ascertain if this 
is a very unlikely value to occur if indeed the null hypothesis is true. To get at this 
we need to refer to the sampling distribution of F under Ho (assuming the popula- 
tion means are equal). Here we must think conceptually as follows. If we were to 
draw samples of sizes 3, 4, and 5 repeatedly from populations with equal means, 
and compute an F ratio for each draw, what would the distribution of F’s look like? 
This is the sampling distribution of F under Ho Statisticians have determined that 
the distribution will be positively skewed, with a modal value of approximately 1. 
We will sketch the distribution shortly. 

Now, suppose we are testing Ho at o = .05. The above sampling distribution will 
have the following pair of degrees of freedom: 


df, = k — 1 (degrees of freedom between) and 
dfw = N — К (degrees of freedom within) 


Thus, for our data df, = 3-1 = 2 and ај, = 12-3 = 9. For an F distribution with 2 
and 9 degrees of freedom it has been determined that the 95th percentile point is 
4.26 (the point corresponding to testing at .05 level). That is, if the null hypothesis 
is true, then only 596 of the time would we expect to obtain an F greater than 4.26. 
Thus, 4.26 is our critical value, and if the value of the test statistic is greater than 
4.26 we will reject Ho. For our case F = 3.12, so we fail to reject Ho and conclude 
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that it is possible that the population means are equal. The sampling distribution is 
sketched below: 


F (under H,) 


Frequency 


1 
men reject Ha 


The results of an ANOVA are typically summarized in a table as follows: 


Source SS Df MS F 
Between Eni(x; — x)? k-1 SSp / (k 1) MS, \ MS\, 
Within Уха — ж)2 + N-1 SS / (М—Ю) 

Ухо — X2)? + 
ха — XK)? 


For the above example this table would be: 


Source SS df MS F 
Between 23.2069 2 11.6035 3.12 
Within 33.4667 9 3.7185 


The critical values for ANOVAs with varying sample size and different о levels 
have been tabled, and are found in Table B.1 at the end of this book. To give the 
reader practice in using these tables we consider two examples. 


Example 1 

An experimenter runs a З group ANOVA with 10 subjects per group, testing ага, = 
.10. He obtains F = 2.16. Does he reject the null hypothesis? First we find degrees 
of freedom between and within: df, = 3 — 1 = 2 and dfw = 30 – 3 = 27. Reference to 
Table B.1 then shows that the critical value = F 102,27 = 2.51, and thus he would fail 
to reject the null hypothesis. 
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Example 2 


An investigator runs a 4 group ANOVA with following sample sizes: nı = 15, m = 
20, пз = 10, and па = 25. She will test at = .01. Will she reject Ho if she obtains F 
= 5.26? The degrees of freedom are df, = 3 and dfw = 70 4 = 66. Reference to Ta- 
ble B.1 shows that critical value for 3 and 66 degrees of freedom is not in the table. 
What do we do here? Note that the tabled values for error degrees of freedom are 
given from 1 to 30 and then jump to 40, 60, 120, and infinity. The reason is that the 
critical values change very little once the degrees of freedom gets beyond 30. We 
could interpolate in our case between 60 and 90, but since the values change so lit- 
tle our recommendation is simply to use the critical value for the closest error de- 
grees of freedom, which here is 60. Thus, the critical value is F 01:3,60 = 4.13. Since 
the value of F = 5.26, which is greater than 4.13, we reject the null hypothesis. 

When we reject the null hypothesis at some a level all we know is that there is 
an overall difference among the groups. To locate where the differences lie (e.g., 
which pairs of groups are significantly different) we need some post hoc (after 
this—from the Latin) procedure. Many such post hoc procedures have been devel- 
oped, and we consider these in Section 2.10. 


2.4 EXPECTED MEAN SQUARES 


Earlier we stated that the modal value of F in the sampling distribution under Ho 
was about 1; i.e., this is the value we would expect to obtain most frequently if in- 
deed the population means are equal. In other words, we were saying that the ex- 
pected value of F is about 1, or in symbols E(F) « 1. The reason this is true is be- 
cause of the expected values for MS, and MS,, under the null hypothesis. 

The reader can think of expected value as the long term average. As a simple ex- 
ample, consider flipping a coin 1,000 times. Then, if itis a fair coin, we would ex- 
pect about 500 heads and 500 tails, that is, E(H) = E(T) = .5. In the ANOVA con- 
text we think of repeating the experiment thousands of times, and computing a MS, 
and MS,, each time. Now, it can be shown that if we were to average the thousands 
of MS,,'s, then the average would be 62. Recall that 62 was the assumed common 
population variance for the groups. That is, 62 involved one of the assumptions un- 
derlying the analysis of variance. Thus we have 


E(MS,,) = 62 (when Но is true) (6) 


Also, itis important to note that the size of MS\, does not depend on whether the 
population means differ ог not. However, we will see that the E(MS,) does depend 
on differences in population means. It can be shown that 


E(MS;) = б? +n У (ш — 1)? Kk — 1) (7) 
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If the population means are equal (Ho is true), then this means И 2 u» =... = Uk 
= and the second term in the above expression will be 0, or in other words, 


Е(М8;) = 6? (when Ho is true) (8) 


Thus, when the null hypothesis is true, the expected values for numerator and 
denominator of the F ratio are equal and the expected value of F statistic is equal to 
about 1. The reason that E(F) is not exactly equal to | is because the expected value 
of a quotient is not equal to the quotient of expected values. 

Thus, evidence in favor of rejecting Ho will be reflected in an F ratio greater 
than 1. How much greater than 1 the F must be to reject Ho depends a great deal on 
sample size. The next chapter on power deals with this issue in detail. 


2.5 М5,, AND MS, AS VARIANCES 


We mentioned earlier that MS,, and MS, actually represent variances. Now we 
show this algebraically. This is easiest to see for equal n per group and so that is 
demonstrated. Using Equations 3 and 4, we can write MS,, as 


М5, = [У (ба —3 + У (ар —3Y ++ У (ар —Х)? -|N — 1) 


For equal n per group, we have М = nk and therefore we have N—k = nk—k = 
k(n—1). Thus, we can rewrite the above equation as 


MS, = 1/0 D[ 2 Ga —3)! + У ann) ++ У (хе x» + 


or 


Х(ха — X1)? 4 Bia = 0 а ы.а Хой = ж ia 


MS, =1/k 
n—i n—i n—i 
— 
variance variance variance 
for gp 1 for gp 2 for gp k 


Recall from beginning statistics that each is in the form of a variance, since vari- 
ance for a single group of subjects is 52 = È (x; — х)2/(и – 1). 

Thus, for equal group size, MS, is just the average of the sample variances for 
the groups. 

Using Equations 1 and 2, we can write MS, as follows: 


MS, = [nE X + п(% — x)? + > + п(Х x Kk —1) 


or 


ONE WAY ANALYSIS OF VARIANCE 55 


Ms, =n У (ба = xy! (k —1) 


variance for k goup means 
about the grand mean 


Thus, MS, is a weighted variance of the group means about the grand mean. 
This is somewhat more subtle, but note that except for the n we have the form of a 
variance, where the group means are playing the role of individual observations 
and the grand mean is playing the role of the mean for a single group. 


2.6 A LINEAR MODEL FOR THE DATA 


We now state the /inear model for each subject's score on which the one way 
ANOVA is based. The model for the score of the ith subject in group (у/) is given 
by: 


уй =U + OY tei (9) 


where Ц is the grand mean for all subjects, 0; = u; — |t is the treatment effect for the 
jth treatment, and ej; is the random error for the ith subject in the jth treatment. 

Thus, we are postulating that a subject’s score is composed of three parts: (1) a 
general effect—the grand mean, (2) an effect unique and constant within a given 
treatment (о), and (3) an effect that is unpredictable, that is, еу. It is assumed that 
the e;; are independent, normally distributed within each treatment, and have the 
same variance for each treatment. Note that these assumptions for the еу; imply ex- 
actly the same assumptions for the у; (the subjects’ scores), since е; is the only ran- 
dom part of the model on the right side of Equation 9. 

To gain some feeling for the above linear model, consider three treatments with 
4 subjects in each treatment. Suppose there is just a general effect (i.e., по treat- 
ment effect or random error). Then the data would look like this: 


MODEL: уу = u 
T; T» T3 
20 20 20 
20 20 20 
20 20 20 
20 20 20 


Next, suppose there is in addition a treatment effect but no random error. Then 
the data might look like this: 


MODEL: yj; = И + о), 
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Tı Т; Т; 
19 17 24 
19 17 24 
19 17 24 
19 17 24 


In the above we һауе 04 = —1, o2 = —3, and оз = 4. 

But both of the above situations are too simple for real data, since subjects’ 
scores will essentially always vary within each treatment group. The main reason 
they will differ is because of individual differences (they come to the treatments 
with different capabilities, backgrounds, motivation, etc.). Measurement error also 
contributes to within treatment variability. Since the у will vary, this implies the 
random error components will vary. If we now add an error component to each 
subject’s score we might obtain a realistic data set like this: 


MODEL: yj = р + Oj+ е 


T; T» T3 
18 18 23 
24 19 22 
21 11 21 
17 16 28 
Неге we have added an error component of e;; = —1 for the first subject’s score in 


treatment 1, an ez = 5 for the second subject in treatment 1, an езі = 2 for the third 
subject, etc. 


2.7 ASSUMPTIONS IN ANOVA 


We mentioned earlier that the analysis of variance is based on the following three 
assumptions: 


1. The observations are normally distributed on the dependent variable in 
each group. 

2. The population variances for the groups are equal. This is the so-called 
homogeneity of variance assumption. In symbols this would be 
Gr ебе «еб =6" 

3. The observations are independent. 


Why is it important to study the assumptions underlying ANOVA? Because in 
ANOVA we set up a mathematical model based on the assumptions, and all mathe- 
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matical models are approximations to reality. Therefore, violations of the assump- 
tions are inevitable. The salient question becomes, “How radically must a given as- 
sumption be violated before it has a serious effect on type I and type П error rates?” 
Thus, we may set our о = .05 and think we are rejecting falsely 5% of the time, but 
if a given assumption is violated we may be rejecting falsely 4096 of the time. For 
these kind of situations we would certainly want to be able to detect such violations 
and take some corrective action. But all violations of assumptions are not serious, 
and hence it is crucial to know which assumptions to be particularly concerned 
about, and under what conditions. Before we begin our review of a considerable 
literature on violations of assumptions in ANOVA, it is helpful to cover some basic 
terminology that is needed in discussing the results of Monte Carlo (i.e., simula- 
tion) studies. 

The nominal 0. (level of significance) is the level set by the experimenter, and is 
the percent of time one is rejecting falsely when the null hypothesis is true and all 
assumptions are met. The actual а, is the percent of time one is rejecting falsely if 
one or more of the assumptions is violated. We say a test statistic is robust if the ac- 
tual а, is very close to the nominal o. 

Numerous studies have examined the effect of violations of assumptions in 
ANOVA, and an excellent summary of this literature has been provided by Glass, 
Peckham, and Sanders (1972). Their review indicates that non-normality has only 
a slight effect on the type I error rate, even for very skewed or kurtotic distribu- 
tions. For example, the actual as for some very non-normal populations were only 
.055 or .06: very minor deviations from the nominal level of .05. We say the F sta- 
tistic is robust with respect to the normality assumption. 

The reader may be puzzled as to how this can be. The basic reason is the Central 
Limit Theorem, which states that the sum of independent observations having any 
distribution whatsoever approaches a normal distribution as the number of obser- 
vations increases. To be somewhat more specific, Bock (1975) notes, "even for dis- 
tributions which depart markedly from normality, sums of 50 or more observations 
approximate to normality. For moderately non-normal distributions the approxi- 
mation is good with as few as 10 to 20 observations" (p. 111), Now, since the sums 
of independent observations approach normality rapidly, so do the means, and the 
sampling distribution of F is based on means. Thus, the sampling distribution of F 
is only slightly affected, and therefore the critical values when sampling from nor- 
mal and non-normal distributions will not differ by much. 

Lack of normality due to skewness also has only a slight effect on power (a few 
hundredths). Platykurtosis (a flattened out distribution relative to the normal) does 
affect power, and the effect can be substantial for small n. 

Now, we deal with the second assumption, homogeneity of the population vari- 
ances. If the group sizes are equal or approximately equal (largest/smallest « 1.5), 
then the F statistic is robust for unequal variances. That is, the actual о stays close 
to the nominal a (level of significance). The only time one need worry is when the 
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group sizes are sharply unequal (largest/smallest > 1.5) and a statistical test shows 
that the population variances are unequal. For this class of situations the studies 
have found that if the large variances are associated with the small group sizes, 
then F is liberal. A statistic being liberal means we are rejecting falsely too often, 
i.e., the actual о > nominal о Thus, an experimenter may think he or she is reject- 
ing falsely 5% of the time (nominal о), but in fact the true rejection rate may be 
11% (actual о). On the other hand, when the large variances are associated with the 
large group sizes, then the F statistic is conservative. This means the actual о < 
nominal о. Many researchers would not consider this serious; however, note that 
the smaller œ will cause a decrease in power. 

There are many statistical tests for homogeneity of variance (e.g., Bartlett's, 
Cochran’s, Hartley's Fmax), but these all suffer from being very sensitive to 
non-normality. That is, one may reject with these tests and conclude that the popu- 
lation variances are different when in fact the rejection may have been due to 
non-normality in the underlying populations. Fortunately there is a test, due to 
Levene, which is somewhat more robust against non-normality, and it is available 
on SAS and SPSS. 


Examples 


Consider the three data situations below. In which (if any) of the cases would you 
be concerned? 


Case 1 Case 2 Case 3 
GROUPS GROUPS GROUPS 
1 2 3 1 2 3 1 2 3 4 
42-7 IB 20 7 20 50 30 —— 10 30 15 25. 
s? 15 90 42 80 10 35 50 100 70 140 


In Case 1 there is no need to be concerned since the group cases are approxi- 
mately equal (20/17 « 1.5), and therefore F is robust. In Case 2 the group sizes are 
sharply unequal (50/20 » 1.5), so there will be a problem if a statistical test shows 
the population variances to be different. We use Hartley's Fmax = largest vari- 
ance/smallest variance, assuming normality is not a problem, and find that Fmax = 
80/10 = 8. Referring to Table B.4, and using the average group size (35) to enter the 
table, we find that the critical value at .05 is about 2.4. Thus, we conclude the popu- 
lation variances are different. In this case the large sample variances are associated 
with the smaller group sizes, so that F will be liberal. What is to be done here? 
There are at least 3 possibilities. One is to do an ANOVA which does not assume 
equal variances. Another choice is to simply test at a more stringent о level (say 
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01), realizing that the actual œ will probably in the vicinity of .05. A third choice is 
to seek help from a statistician on a variance stabilizing transformation (such as 
square root or log). I would recommend either of the first two choices for applied 
researchers. 

In Case3 we again have a potential problem because the group sizes are sharply 
unequal (30/10 » 1.5). Using Table B.4 and an average group size of 20 to enter the 
table, we find Fmax = 140/50 = 2.8. This is not significant since the critical value = 
3.29. Therefore there is no problem here since the assumption is tenable. 

So far we have treated heterogeneity of variance as a nuisance, something we 
wish will not happen so that the analysis on the means can proceed accurately. 
However, unequal variances, or a focus on dispersion, in some situations may be an 
interesting and important finding. Raudenbusch and Bryk (1987) cite a study by 
Bryk (1977) in which a compensatory program that increased mean achievement 
also increased dispersion in achievement. As they note, such a program might in- 
crease the number of children failing to attain some minimum standard even 
though it raised mean achievement. Also, occasionally variance reduction is an ex- 
plicit goal of educational programs, as in some mastery learning programs 
(Bloom, 1984). 


2.8 THE INDEPENDENCE ASSUMPTION 


Although we have listed the independence assumption last, it is by far the most im- 
portant assumption, for even a small violation of it produces a substantial effect on 
both the level of significance and the power of the F statistic. Just a small amount 
of dependence among the observations causes the actual œ to be several times 
greater than the nominal о. Dependence among the observations is measured by 
the intraclass correlation R, where: 


R = (MS, - MS) (MS, + (n – 1)М8,) (10) 


MS, and MS,, are the numerator and denominator of the F statistic and n is the 
number of subjects per group. 

Table 2.1, from Scariano and Davenport (1987), shows precisely how dramatic 
of an effect dependence has on type I error. For example, for the 3 group case with 
10 subjects per group and moderate dependence (intraclass correlation = .30) the 
actual o is .5379! Also, for 3 groups with 30 subjects per group and small depend- 
ence (intraclass correlation = .10) the actual о is .4917, almost 10 times the nomi- 
nal œ of .05! Notice, also from the table that for a fixed value of the intraclass corre- 
lation the situation does not improve with larger sample size, but gets far worse. 

Now let us consider some situations in social science research where depend- 
ence among the observations will be present. Teaching methods studies constitute 
a broad class of situations where dependence is undoubtedly present. For example, 
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a few troublemakers in a classroom would have a detrimental effect on the achieve- 
ment of many children in the classroom. Thus, their posttest achievement would be 
at least partially dependent on the disruptive classroom atmosphere. On the other 
hand, even in a good classroom atmosphere, dependence is introduced, for the 
achievement of many of the children will be enhanced by the positive learning situ- 
ation. Therefore, in either case (positive or negative classroom atmosphere), the 
achievement of the children is not independent of the other children in the class- 
room. 

Another situation I came across recently in which dependence among the obser- 
vations was present involved a study comparing the achievement of students work- 
ing in pairs at microcomputers vs. students working in groups of three at the 
micros. Here, if Bill and John are working at the same microcomputer, then obvi- 
ously Bill’s achievement is partially influenced by John. The proper unit of analy- 
sis in this study is the mean achievement for each pair and triplet of students, as it is 
plausible to assume that the achievement of students on one micro is independent 
of the students working at the other micros. 

Glass and Hopkins (1984) make the following statement concerning situations 
where independence may or may not be tenable: “Whenever the treatment is indi- 
vidually administered, observations are independent. But where treatments in- 
volve interaction among persons, such as ‘discussion’ method or group counsel- 
ing, the observations may influence each other” (p. 353). 


What Should Be Done With Correlated Observations? 


Given the results in Table 2.1 for a positive intraclass correlation, one route investi- 
gators should seriously consider if they suspect that the nature of their study will 
lead to correlated observations is to test at a more stringent level of significance. 
For the 3 and 5 group cases in Table 2.1 with 10 observations per group and 
intraclass correlation = .10, the error rates аге 5 to 6 times greater than the assumed 
level of significance of .05. Thus, for this type of situation it would be wise to test 
at о = .01, realizing that your actual error rate will be about .05 or somewhat 
greater. For the 3 and 5 group cases in Table 2.1 with 30 observations per group and 
intraclass correlation = . 10, the error rates are about 70 times greater than .05. 
Here, it would be advisable to either test at œ = .01, realizing that actual a will be 
about .10, or test at an even more stringent level than .01. 

If several small groups (counseling, social interaction, etc.) are involved in each 
treatment, and there is clear reason to suspect that subjects’ observations will be 
correlated within groups, but that observations will not be correlated across the 
groups, then consider using the group mean as the unit of analysis. Of course this 
will reduce the effective sample size considerably; however, this will not cause as 
drastic a drop in power as some have feared. The reason is that the means are much 
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TABLE 2.1 
Actual Type | Error Rates for Correlated Observations in a One Way 
ANOVA (Nominal о = .05) 


Intraclass Correlation 
топ .00 .01 10 .30 .50 .70 .90 .95 .99 


2 3 .0500 .0522 .0740 .1402 .2374 2.3819 .6275 .7339 .8800 
10 .0500 .0606 11654 3729 .5344 .6752 .8282 8809 _.9475 

30 .0500 .0848 3402 5928 .7205 8131 .9036 .9335 9708 

100 .0500 11658 5716 7662 .8446 .8976 9477 9640 2.9842 

3 3 .0500 .0529 0837 .1866 .3430 .5585 .8367 .9163 9829 
10 .0500 0641 2227 5379 „7397 8718 9639 .9826 .9966 

30 .0500 .0985 4917 „7999 .9049 .9573 98866 .9946 .9990 

100 .0500  .2236 17791 9333 .9705 9872 996  .9984 .9997 

5 3 .0500 .0540 .0997 .2684 5149 7808 97044 .9923 9997 
10 .0500 .0692 3151 7446 9175 9798 9984 .9996 1.0000 

30 .0500 .192 .6908 .9506 .9888 9977 9998 1.0000 1.0000 

100 .0500 .3147 .9397 .9945 .9989 .9998 1.0000 1.0000 1.0000 

10 3 .0500  .0560 11323 .4396 .7837 9664. .9997 1.0000 1.0000 
10 .0500 .0783 4945 9439 9957 9998 1.0000 1.0000 1.0000 

30 0500 .594 .9119 9986 1.0000 1.0000 1.0000 1.0000 1.0000 

100 .0500 .4892 9978 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 


m—number of groups 
n—number of observations per group 


more stable than individual observations and hence the within variability will be 
far less. 

Table 2.2, from Barcikowski (1981), shows that if the effect size is medium or 
large, then the number of groups needed per treatment for power > .80 doesn’t have 
to be that large. For example, асо = .10, intraclass correlation = .10, and medium 
effect size, 10 groups (of 10 subjects each) are needed per treatment. For power > 
.70 (which I consider adequate) аго = .15 one probably could get by with about 5 
or 6 groups of 10 per treatment. This is a rough estimate, since it involves double 
extrapolation. 

Before we leave the topic of correlated observations, we wish to mention an in- 
teresting paper by Kenny and Judd (1986), who discuss how non-independent ob- 
servations can arise because of several factors, grouping being one of them. The 
following quote from their paper is important to keep in mind for applied research- 
ers: 


Throughout this article we have treated nonindependence as a statistical nuisance, to 
be avoided because of the bias that it introduces.... There are, however, many occa- 
sions when nonindependence is the substantive problem that we are trying to under- 
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TABLE 2.2 
Number of Groups per Treatment Necessary for Power > .80 in a Two 
Treatment Level Design 


Effect Size Intraclass Correlation 
Number per 10 .20 

о: level group .20 50 .80 .20 .50 ‚80° 
10 73 13 6 107 18 8 
15 62 11 5 97 17 8 
20 56 10 5 92 16 7 

‚05 25 53 10 5 89 16 7 
30 51 9 5 87 15 7 
35 49 9 5 86 15 7 
40 48 9 5 85 15 7 
10 57 10 5 83 14 7 
15 48 9 4 76 13 6 
20 44 8 4 72 13 6 

10 25 41 8 4 69 12 6 
30 39 7 4 68 12 6 
35 38 7 4 67 12 5 
40 37 7 4 66 12 5 


a 20—small effect size 
.50—medium effect size 
.80—large effect size 


stand in psychological research. For instance, in developmental psychology, a fre- 
quently asked question concerns the development of social interaction. 
Developmental researchers study the content and rate of vocalization from infants for 
cues about the onset of interaction. Social interaction implies nonindependence be- 
tween the vocalizations of interacting individuals. To study interaction develop-men- 
tally, then, we should be interested in nonindependence not solely as a statistical 
problem but also a substantive focus in itself. ... In social psychology, one of the fun- 
damental questions concerns how individual behavior is modified by group contexts. 
(p. 431) 


2.9 ANOVA ON SPSS AND SAS 


Now we consider how to run a one way ANOVA on SPSS and SAS, and how to in- 
terpret the printout. To illustrate we shall use the following 4 group data set: 
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Group 1 Group 2 Group 3 Group 4 
2 7 4 8 
3 9 4 4 
5 11 5 7 
6 8 7 
3 


The complete SAS control lines for obtaining the ANOVA and Tukey proce- 
dure are presented in Table 2.3, and the annotated SAS printout is given in Table 
2.4. We also ran the above data on SPSS for Windows 12.0, and appropriate 
screens are given in Table 2.5. First click on ANALYZE, then scroll down to 
COMPARE MEANS and over to ONE WAY ANOVA. When you click on ONE 
WAY ANOVA and select Y as the dependent variable and GPID as the factor, the 
screen appears as in the middle of Table 2.5. When you select POST HOC and se- 
lect TUKEY, the screen appears as in the bottom of Table 2.5. Selected ANOVA 
and Tukey printout from SPSS for Windows 12.0 appears in Table 2.6. 


TABLE 2.3 
SAS Control Lines for One Way ANOVA and Tukey Procedure 
on Sample Problem 


DATA INTERM; 
@® INPUT GPID Y @@; 


PROC ANOVA; 
® CLASS GPID; 
MODEL Y = GPID; 
MEANS GPID/TUKEY; 
®© PROC PRINT; 


(D There is a semicolon at the end of every SAS command, except for the data lines. The @ @ is 
needed in order to put data for more than one subject on the same line. 

© The first number of each pair is the group identification of the subject and the second number is 
the score on the dependent variable. 

@ This PROC MEANS is necessary to obtain the means on the dependent variable in each group. 

@ The ANOVA procedure is called and GPID is identified as the grouping (independent) variable 
through this CLASS statement. 

® This procedure provides a listing of the data. 
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TABLE 2.4 
Selected Printout from SAS ANOVA for a One Way ANOVA 


Insert art INCLUDING FOOTNOTES) from р. 82 of previous edition 


TABLE 2.5 
SPSS for Windows 12.0 Screens for One Way ANOVA and Tukey 
Procedure on Sample Problem 


Insert art from p. 83 of previous edition 
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TABLE 2.6 
Selected ANOVA and Tukey Procedure Printout From SPSS for Windows 
12.0 


Insert art from p. 84 of previous edition 
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p Values (Tail Probabilities) 


In Tables 2.4 and 2.6 the reader will note to the right of the F statistic for the 
ANOVA the following for SAS (PR > F) and for SPSS (SIG), with a numerical 
value of .0196 in both cases. Although labeled somewhat differently by the pack- 
ages, these are p values, or tail probabilities. It is the probability of obtaining an F 
larger than 4.857 when the null hypothesis is true (population means are equal). If 
we had set о = .05 a priori, then we would reject Ho here since we are willing to 
take a 5% chance of rejecting falsely, and the tail probability indicates there is only 
about a 2% chance. These tail probabilities, which are printed out on all the major 
statistical packages, eliminate the need to look up critical values. We can adopt the 
following rules: 


tail prob. < о level => reject at that о; level 
tail prob. > о level = fail to reject at that о level 


s F (under Но) 
o 
с 
Ф 
3 
c 
Ф 
ш. 
tail prob. = .01 
—_— 
1 3.49 4.49 5.95. 


к— tail prob. = .025 ->- 
1— — tail prob. for « = .05 ——— = 


Students often get confused when told to reject if the tail probability is /ess than 
the о level, since they may have been told repeatedly in an introductory statistics 
course to reject if the value of the test statistic is greater than the critical value. The 
connection that needs to be made here is to see that if the tail probability is < a 
level, then the value of the test statistic must be in the critical region. To illustrate 
this, suppose that in the computer problem we had tested F for significance by us- 
ing the critical value at the .05 level, which is 3.49. It is very important to note that 
the critical value cuts off a tail probability of .05, i.e., only 596 of F values will be 
greater than 3.49 if Ho is true (this area is shaded in the diagram below). Now, any 
value of F greater than 3.49 (in the critical region) must have a tail probability less 
than .05. Why? Because any F » 3.49 will have a smaller area under the F distribu- 
tion, and this area represents the tail probability. For example, F = 4.49 has a tail 
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probability = .025; the lined area, while F = 5.95 has a tail probability = .01. Notice 
that both of these Fs (4.49 and 5.95) are in the critical region. 

Huberty (1987) has written an interesting article in which he discusses p values 
and notes: 


The lack of discussion in textbooks written for behavioral science researchers is 
somewhat puzzling in light of the common practice of reporting P-values (the lower- 
case p is often used) in journal articles, and in light of attention paid to them in publi- 
cation manuals (e.g., American Psychological Association), (p. 5) 


2.10 POST HOC PROCEDURES 


As mentioned earlier, there are numerous post hoc procedures available for deter- 
mining where the differences lie after the F statistic has indicated there is a signifi- 
cant overall difference. Among the post hoc procedures are the Tukey, Scheffé, 
Newman-Keuls, Duncan, and Fisher’s LSD (the so-called protected 2 test). All 
these procedures have two fundamental purposes: 


1. To enable us to ferret out where the differences lie, and 

2. To maintain the overall a level (or experimentwise error rate) at some pre- 
determined level, usually set at .05. In other words, keep a lid on the proba- 
bility of false rejections for all the tests being done. 


Unfortunately, the Newman-Keuls, Duncan, and Fisher’s LSD do not control 
overall а, as claimed. That is, they tend to be liberal. On the other hand, the Scheffé 
procedure tends to be quite conservative (since it allows for a wide range of com- 
parisons to be done). We discuss and illustrate the Scheffé procedure in Section 
2.12. For paired comparisons we favor and present the Tukey procedure for 3 rea- 
sons: 


1. The Tukey does control the overall œ as claimed (Hayter, 1984). 

2. The Tukey procedure examines a focused, meaningful, and easily inter- 
preted set of comparisons, that is, all paired comparisons. 

3. The Tukey is a fairly powerful procedure for detecting differences. 


Thus the Tukey provides a nice balance in terms of controlling on both type I 
and type П errors, while focusing on meaningful, easily interpreted comparisons. 

However, if you are only interested in comparing each of several treatment 
groups against a control group, then the Dunnett (1955) procedure is most power- 
ful and should be used (see Exercise 16). 
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211 TUKEY PROCEDURE 


The Tukey procedure, which is sometimes called the HSD (honestly significant 
difference) test, enables us to examine all pairwise group comparisons with the 
experimentwise (overall) о; level held in check. The studentized range statistic 
(which we denote by 4) is used in the procedure, and the critical values for it are 
given in Table B.2 in the back of the book. The procedure establishes a set of simul- 
taneous confidence intervals for each pair of population means. The intervals are 
given by: 


(Xi — Xj) E du у-у MS, / n (11) 


or 


(х; — Xj) — de. N-k JMS, /n «Hi =; < (x —Xj) T dux N-& JMS,/n 


where x; and x ; represent the means for any two groups, q is just a tabled value, 
MS,, is the denominator of the F statistic, and is the assumed common group size. 

In deriving the procedure, Tukey assumed equal group sizes. In practice, how- 
ever, often the group sizes are not equal. Does this severely limit the utility of the 
procedure? No, since various studies (Dunnett, 1980; Kesselman, Murray, & 
Rogan, 1976) indicate that the Tukey still controls overall о provided that the pop- 
ulation variances are equal and that n is replaced by the harmonic mean 2niny(m 
+ m) for each pair of groups. The harmonic mean for each pair of groups is what is 
used by default for both SAS and SPSS. Thus, for unequal group sizes the n in 
Equation 11 is replaced by 2niny/(n; + пр) when comparing groups i and j. When this 
replacement is made, it is called the Tukey-Kramer procedure. Now let us consider 
a numerical example to illustrate how to calculate and interpret the intervals. 


Example 


Consider the following 4 group problem with unequal group sizes. The reader may 
check with Hartley's Fmax that homogeneity of variance is tenable here. 


ett] ИННИ ЕНЕ ЖЕККЕН ЖЕККЕН, ТЕЗ 
ni 20 17 14 18 
Xi T 8 10 13 
Si 4 5 6 4 


A one way ANOVA on this data yields F = 129.02/22.22 = 5.81 (p < .05). 
Therefore we know there is a significant overall difference among the groups. To 
locate the pairs that are significantly different we use the Tukey procedure, with 
overall о = .05. First, we need the harmonic means for each pair of groups: 
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Groups Harmonic mean 

1 and 2 2(20)(17)/37 = 18.38 
1 and 3 2(20)(14)/34 = 16.47 
1 and 4 2(20)(1 8)/38 = 18.95 
2 and 3 2(17)(14)/31 = 15.35 
2 and 4 2(17)(18)/35 = 17.49 
3 and 4 2(14)(18)/32 = 15.75 


The tabled value is 405:4,65 = 3.74. Now we set up the intervals: 


Differences Critical value Confidence intervals 
я X; =—1 3744222211838 = 411 (-5.11, 3.11) 

X, —¥3 =—3 3144222211647 = 4.34 (77.34, 1.34) 

Xi —X4—-—6 344/2222 / 18.95 = 4.05 (-10.05, -1.95) 
ә-Ж%--2 3.74./22.22 / 15.35 = 450 (-6.5,2.5) 
X)—X4—-—5 3144222211149 = 422 (-9.22, -.78) 
Xi—X4—-3 31442222 [1575 = 444 (77.44, 1.44) 


Note that the lower limit for the first interval is obtained by subtracting the criti- 
cal value (4.11) from the difference in the means (-1) and the upper limit is found 
by adding the critical value to —1. Therefore 


-1 -4.11 < ш -u2 < -1 + 4.11 or -5.11 < ui; —- pu» < 3.11 


How do we interpret these intervals? First, if the confidence interval includes 0 
we conclude the population means are not different. Why? Because if the interval 
includes 0 it means 0 is a possible value for u; — Цу, which is to say it is possible that 
Li = Цу Thus, in comparing groups 1 and 2 above, we see that the interval for ш — 
Ш is given by 


—5.11 < ui - uo» < 3.11 


Therefore, 0 is a possible value for yı — u2, since 0 is in the interval. Thus, 
groups | and 2 are not significantly different. On the other hand, if the interval does 
not include 0, then the groups are significantly different, since 0 is not a possible 
value for the population mean difference. Examining the above intervals, we find 
that only groups 1 and 4, and groups 2 and 4 are significantly different. 

Confidence intervals are more informative than tests of significance because 
they both indicate significance and give a range of values within which the popula- 
tion mean difference probably lies. Thus, confidence intervals are one way of judg- 
ing the practical significance of results. Consider groups 1 and 4. Suppose a re- 
searcher had decided a priori that the population mean difference had to be at least 


ONE WAY ANALYSIS OF VARIANCE 71 


4 units to be of any practical significance. Now, the confidence interval for groups 
1 and 4 is: 


-10.05 < ui – u4< -1.95 


and the result would not be practically significant because the difference could be 
as small as –1.95. 


2.12 THE SCHEFFÉ PROCEDURE 


The big advantage of the Scheffé procedure is its flexibility. For a k group problem 
one can examine all possible simple (pairwise) and complex contrasts (this is de- 
fined very shortly) among the group means with the assurance that the overall о, 
will be less than some preassigned value (say .05). Thus in exploratory research 
this procedure provides the ultimate in data snooping potential. However, in order 
to keep overall о = .05, while doing all the statistical tests for the very large number 
of comparisons possible, the critical value necessary for significance will be large 
relative to what it would be for other multiple comparison procedures. This means 
power will suffer, which will not be of concern if sample size in your study is large 
(say about 100 subjects per group). If your group sizes are small (about 20 subjects 
per group), however, then on power considerations it would be wise to set overall a 
at .10, or even at .15. 

In general, for k groups with population means ш, Из, . . ., Uk, a contrast among 
the population means is given by 

L= сш + оро +... cud 

where the sum of the coefficients (сг) must equal 0. Note, first of all, that all the 
paired comparisons tested by the Tukey procedure are contrasts. Why? The gen- 
eral form for a paired comparison is W; – у, for the ith and jth groups. The coeffi- 
cient for W; is 1, while the coefficient for u; is —1. But the sum of these coefficients 
is 0 and therefore we have a contrast. 

To illustrate the wide variety of contrasts possible with the Scheffé, consider a 4 
group problem, with population means Џил, Иә, из, and рц. First, we can test all 
paired contrasts for significance (as with the Tukey). Recall that there are 6 paired 
comparisons (1 vs. 2, 1 vs. 3, 1 vs. 4, 2 vs. 3, 2 vs. 4, and 3 vs. 4). But, in addition, 
we can test all kinds of complex contrasts involving more than two groups for sig- 
nificance. Below we list just four possible complex contrasts: 


Li = ш — (u2 + изу2 L2 = (ш + u2) — (u3 + ua) 
L3 = ш — (о + Из + M4)/3 14 = W2 — (Из +4 
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Remember that for each of the above to be a contrast the sum of the coefficients 
must be 0. We show this below for the first two and leave as an exercise for the 
reader to show that L4 and L4 are contrasts. 

For L the coefficients аге сі = 1, c2 = сз = —.5. Therefore, c1 + c2 + сз = 1 + (—5) 
+ (—.5) = 0, and L is а contrast. For 15 the coefficients are с] = c? = 1 and c3 = c4 = 
— 1. Again, сі + c2 + c3 + са = 14+ 1 + (1) + (-1) = 0, and D»? is a contrast. 

As with the Tukey procedure, the Scheffé method establishes a set of simulta- 
neous confidence intervals for all population mean contrasts. The lower and upper 
limits for the intervals are: 


Lõi [lk -1)Fock—1w—« and £ +6; ЈЕ Ем (12) 


where Ё is the estimate of the contrast ап4б z is the estimated standard error of the 
contrast. Now, the estimate for a general contrast is obtained by replacing the pop- 
ulation means by sample means, that is, L = сіх + с2х2 +... + cxx. In the Appen- 
dix of this chapter we show that the estimated variance for a contrast is given by 


6; = MS. |с} In) (13) 


where MS,, is the denominator of the F test and the n; represents the number of sub- 
jects in the ith group. The standard error of the contrast is simply the square root of 
Equation 13. To illustrate calculation of a few Scheffé intervals we reconsider the 
data example used for the Tukey procedure in Section 2.11. There were 4 groups 
with differing group sizes and MS,, = 22.22. 


1 2 3 4 
nj 20 17 14 18 
Xi 7 8 10 13 


We test the following contrasts for significance at an experimentwise error rate 
= .10, i.e., we will obtain the 90% simultaneous confidence intervals: 


Li = ш — (Mo + Из)/2 and L? = p2 — (Из + 43 
The estimates for contrasts L; and L are given by 
£, =7—(8+10)/2 =—2 and £4 = 7—(8 +10 +13)/3 = —3.33 


The standard error for Li is: 


6j, = /22.22[12)/20 + (—.5)? /14] = 1.355 


and the standard error for L 2 18: 
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бр = „/22.22[12)/20 + (—.33)2 /17 + (—.33)? /14 + (—.33)2 /18] 21.25 


Also, J(4 — 1)F10;3,65 = 4/3018) = 2.56. 
The confidence interval for L is given by 


(-2- 1.355(2.56), —2 + 1.355(2.56)) or (-5.469, 1.469) 
while the confidence interval for Lz is given by 
(-3.33- 1.25(2.56), –3.33 + 1.25(2.56)) or (-6.53, —.13) 


Recall that if a confidence interval covers 0 it means the contrast is not signifi- 
cant. Therefore, 14 is not significant. However, L2 is significant since that interval 
does not cover 0. 


2.13 HETEROGENEOUS VARIANCES AND UNEQUAL 
GROUP SIZES 


As previously indicated, the analysis of variance is robust against unequal popula- 
tion variances provided that the group sizes are equal or approximately equal. 
When heterogeneous variances are present various procedures have been recom- 
mended: Welch (1951), Brown and Forsythe (1974) and the Kruskal—Wallis 
nonparametric test. A Monte Carlo study by Tomarkin and Serlin (1986) examined 
the above three procedures and found that the Welch test was superior in most 
cases studied in terms of better control on type I error and greater power. 

In terms of post hoc procedures, recall that the Tukey maintained an honest 
experimentwise error rate with unequal group sizes only if the homogeneity of 
variance assumption is tenable and the assumed common п in the Tukey test statis- 
tic is replaced by the harmonic mean for each pair of groups. Fortunately, there is 
also a Welch t statistic which does not assume equal variances. Games and Howell 
(1976), in a Monte Carlo study on the Tukey procedure, found that the Welch ap- 
proximate 1 statistic kept the experimentwise error rate under control when hetero- 
geneous variances and unequal group sizes are both present. The Welch approxi- 
mate t, which we denote by tw, is given by 


fy = (xi —xj)l 52 /т +5 Inj 


where s? and s? are the sample variances for the ith and jth groups and n; and n; are 
the respective group sizes. Note that since the homogeneity of variance of assump- 
tion is not tenable the Welch statistic uses only those variances for the pair of 
groups being compared. A pooled error term would be inappropriate since the 
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sample variances are estimating different population values. The degrees of free- 
dom (v) for each Welch statistic will in general be different and is given by 
(s? / ni +83 /nj 
М = 
(s? / т)? "n (52 /nj)* 


nj —1 nj —1 


A pair of means was declared to be significantly different in the Games and 
Howell study if 


Its | > Чак P 


Note that with the above approach several different critical values from the 
studentized range table will be needed, since v will tend to be different for the vari- 
ous paired comparisons. 

Below are a few selected results from their study comparing the Welch statistic 
against a pooled error term (МЗ) approach for a 4 group situation: 


Group Sizes MS, Welch 


16, 14, 10,6 


Population Variances 


13; 5,7 122 060 
1, 1;7,7 115 064 
1, 1, 1, 13 .112 .064 


These error rates are to be compared against a significance level of .05. The 
above situations all represent positively biased situations, i.e., where the large vari- 
ances are associated with the small group sizes. Recall that for ANOVA these are 
the situations where it was liberal, with the error rate greater than level of signifi- 
cance. The above results showed that use of a pooled error term caused the Tukey 
approach to be quite liberal, i.e., the actual error rate was over twice the signifi- 
cance level, while use of the Welch statistic kept the actual о quite close to the sig- 
nificance level of .05. 


Example 


To illustrate with an example, consider the four group data set in the table on the 
next page. 

These data were run on SPSS for Windows 12.0, and a selected printout from 
that run is given in Table 2.7. 


ONE WAY ANALYSIS OF VARIANCE 75 


Group | Group 2 Group 3 Group 4 
14 20 36 26 
21 25 29 35 
37 18 31 46 
18 30 22 18 
20 26 45 30 
29 22 43 33 
42 31 27 49 
12 26 33 15 
27 28 35 27 
30 24 28 
33 19 36 

17 40 
21 38 
23 29 
27 22 
32 
19 
29 
28 
23 


214 MEASURES OF ASSOCIATION (VARIANCE 
ACCOUNTED FOR) 


One of the facts of “statistical life” is that whether we obtain significance with any 
statistical test is heavily dependent on sample size. With large enough sample sizes 
even very small differences among the group means will be declared statistically 
significant. Why is this so? To shed some light on this it is helpful to first recall 
from Section 2.5 that the numerator of the F statistic, for equal group sizes, can be 
written as 


MS, = n (%; — xy? /(К—1) 


Thus, the F ratio can be written as 


Е nX (% — x)? K(k—1) 
Е MS, 


F 


Assuming the null hypothesis is false, the numerator can be made arbitrarily 
large by increasing the group sample size n. Now, increasing the sample sizes 
should not have any systematic effect on MS,,, and we assume for the sake of sim- 
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plicity that MS,, remains the same. But, given the above two statements, we see that 
F can be made arbitrarily large by increasing sample size. We now consider a nu- 
merical example to illustrate the above. Suppose two studies are done, each with 3 
groups, and the group means in both cases are 10, 14, and 18. Assume MS,, = 100 
in both cases. One study has 16 subjects per group while the other has 100 subjects 
per group. The grand mean = 14, and the F for the first study is 


Е = 16/2 [(10 — 14? + (18 — 14)2]/100 = 2.56 


The critical value for significance at .05 is 3.15, and therefore this result would 
not be significant. 
The other study, with the same mean differences, has an F of 


Е = 100/2[(10 — 14? + (18 – 14)21/100 = 16 


and this would be significant well beyond the .001 level! 

Now, we illustrate the other point. That is, even very small differences will be 
declared significant if п is large enough. Suppose again 3 groups with MS,, = 100, 
but now there are 400 subjects per group. The means for the groups are 10, 11 and 
12. The F ratio here will be F = 200[(10-1 1)? + (12—11)2]/100 = 4, which is signifi- 
cant at the .05 level, even though the mean differences are very small. To use a do- 
mestic analogy, this is like using a sledgehammer to pound out significance. 

Because of this kind of situation it has been argued for some time (perhaps pop- 
ularized most by Hays in his influential text Statistics for Psychologists, 1963) that 
we need some way of determining whether a statistically significant result is prac- 
tically significant. Hays (1963) introduced his (92 as a measure of association 
(strength of relationship) to get at practical significance, and such measures have 
subsequently been recommended by various textbook authors (Cohen & Cohen, 
1975; Kerlinger & Pedhazur, 1973; Kirk, 1982). The two most commonly used 
measures of this type are n? and Hays (92, The formulas for each of them for a one 
way ANOVA are given below: 


T? = SS, / SS, 
where 55, = SS; + 55, (total sum of squares), and 
à? = (SS, тт (К БЕ DMS,,)/(SS; = MS, ) 


Usually the numerical values for these two will not differ a great deal, although 
Hays’ measure is generally regarded as preferable because he used unbiased esti- 
mates in deriving his measure (but the measure itself is not unbiased). 

Such measures can be useful after the test of significance, since they are essen- 
tially independent of sample size. Nevertheless, there are limitations associated 
with these measures, as O' Grady (1982) has pointed out in an excellent review on 
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measures of explained variance. He cites 3 basic reasons why such measures 
should be interpreted with caution (measurement, methodological, and theoreti- 
cal). With respect to measurement he notes that the reliability of the variables 
somewhat restricts how large a measure of association can be, since these mea- 
sures are correlational in nature. Several methodological factors are mentioned; we 
discuss just two of them. One is the homogeneity of the population sampled. Since 
measures of association are correlational measures, the more homogeneous the 
population, the smaller the correlation will tend to be, and therefore the smaller the 
percent of variance that can be potentially accounted for. This is simply the restric- 
tion of range phenomenon you encountered in beginning statistics when studying 
the Pearson correlation. A second factor that can have a substantial effect on the 
magnitude of a measure of association is the number of levels chosen, and how 
they are chosen, for a fixed effects ANOVA (which is what we are dealing with). To 
illustrate he uses the following example. Suppose there are 3 researchers that wish 
to examine the relationship between a hypothesized carcinogen and the incidence 
of cancer. The first researcher chooses to contrast a control condition (0 exposure) 
with a 2% exposure to the carcinogen. The second researcher chooses to maximize 


TABLE 2.7 
Selected Printout From SPSS for Windows 12.0 for Unequal Variances 
Example With Tamhane апа Games-Howell Post Hoc Procedures 


Insert art from p. 94 of previous edition 
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TABLE 2.7 
(Continued) 


Insert art from p. 95 of previous edition 


the changes of a relationship and contrasts 0% exposure with 20% exposure. 
Finally, the third researcher is interested in determining the shape of the relation- 
ship across various levels of the supposed carcinogen. Below we present the de- 
scriptive statistics, F ratios, and eta squares for the 3 studies: 


Exposure Group 


Study 0% 2% 5% 10% 20% 
І x 10 12 

8 2 2 Е= 5, 12 = .22 
2 x 10 18 

8 2 2 F=80,n2=.82 
3 x 10 12 14 16 18 

5 2 2 2 2 2 F=25,n? = .69 


ONE WAY ANALYSIS OF VARIANCE 79 


Ten subjects are assumed in each group in each study. Notice how the measure 
of association is drastically affected by the number of levels and how the levels are 
chosen, even though the means and standard deviations are the same. 

A theoretical point O’Grady mentions which should be kept in mind before 
casting asperations on a “low” amount of variance accounted for is that most be- 
haviors have multiple causes, and hence it will be difficult in these cases to account 
for a large amount of variance with just a single cause (say treatments). 

Anyone planning on using measures of association in their research should read 
and think carefully about O’Grady’s paper. To enforce the point that a “small” 
amount of variance accounted for may indeed be practically significant we con- 
sider an example from Rosenthal and Rosnow (1984). They consider the compari- 
son of a treatment and control group where the dependent variable is dichotomous, 
whether the subjects live or die. The following table is presented: 


Treatment Outcome 


Alive Dead 
Treatment 66 34 100 
Control 34 66 100 
100 100 


Since both variables are dichotomous, the phi coefficient ф, a special case of the 
Pearson correlation for dichotomous variables (Glass & Hopkins, 1984), measures 
the relationship between them: 


ф = (342 -662)/,/100(100)(100/(100) ==,32 


Squaring (since it is a correlation) yields variance accounted for, which is 
(—.32)2 = .10. Thus, the treatment-control distinction accounts for “only” 10% of 
the variance in the outcome. However, this is enough to increase the survival rate 
from 34% to 66%, far from trivial! The same type of interpretation would hold if 
we were to consider some less dramatic type of outcome like improvement vs. no 
improvement, where treatment was, say, a type of psychotherapy. Also, the inter- 
pretation is not confined to just a dichotomous measure. 


2.15 PLANNED COMPARISONS 


One approach to the analysis of data is to first demonstrate overall significance, 
and then follow up to assess the significant subsources of variation (i.e., which par- 
ticular groups differed). This approach is appropriate in exploratory studies where 
it is necessary to first establish that an effect exists. There may be a weak literature 


80  CHAPTER2 


base, or none on which to base specific hypotheses. This type of study is somewhat 
unfocused and some have even referred to these studies as “fishing expeditions.” 

Now we consider a more focused type of study, where there either is a fairly 
strong theoretical and/or literature base, or the investigator has specific questions 
to ask of the data. These questions will be in the form of hypotheses involving 
group comparisons. This is more of a confirmatory type study. Here, a priori, the 
investigator sets up planned comparisons among the group means. It is important 
to use planned comparisons when the situation justifies them, since performaing a 
small number of statistical tests cuts down on the probability of spurious reults 
(type I errors). 

Hays (1981) has shown that planned comparisons are a more powerful ap- 
proach statistically. If we set up a small number of comparisons, then power will be 
enhanced and overall о can be controlled through the Bonferroni Inequality. This 
is a very important inequality. It states that if k hypothese, k planned comparisons 
here, are tested separately with type I error rates of 041, 02, ..., Ox, then 


overall а € Q1 + 00 +... + Ou (16) 


If the hypotheses are each tested at the same alpha level, sa a’, then the 
Bonferroni upper bound becomes 


overall a € ko’ (17) 


Ifthe comparisons are independent (this is defined shortly), then an exact calcu- 
lation for overall o is available. First, (1 — 011) is the probability of no type I error 
for the first comparison. Similarly, (1 — 0) is the probability of no type I error for 
the second, (1 — 0з) the probability of no type I error for the third, etc. If the tests 
are independent, then we can multiply probabilities. Therefore, (1 — 04) (1 — 
42)...(1 — о) is the prbability of no type I errors for all k tests. Thus, 


overall a = 1 — (1 о) (1 — O)...(1 — о) (18) 


is the probability of at least one type I error. If the tests are not independent, then 
overall о will still be less than given in Equation 18 although it is very difficult to 
calculate. If we set the alpha levels equal, say to a’, for each test, then Equation 18 
becomes overall a = 1 2 (1 – 0”)(1 – 0)...(1– 0) = 1-(1 – 0). This expression, 1 
-(1-о/725ів approximately equal to ko’ for small о“. The table below compares 
the two for 0/ = .05, .01, and .001 for a number of tests ranging from 5 to 100. 


ONE WAY ANALYSIS OF VARIANCE 81 


a’ = .05 a’ = .01 a’ = .001 
No. of Tests | 1-(1— oc ко’ 1 - (1-0) kot 1 - (1 0)“ kot 
5 226 25 049 105 00499 005 
10 401 50 .096 10 00990 010 
15 537 15 140 15 10149 015 
30 785 1.5 260 30 .0296 .030 
50 923 2.5 395 50 .0488 .050 
100 .994 5.0 .634 1 .0952 100 


First, the numbers in the table greter than 1 don’t represent probabilities, since а 
probability can’t be greater than 1. Second, note that if we are testing each of a 
large number of hypotheses at the .001 level, the difference between 1 — (1 — a’) 
and the Bonferroni upper bound of ka’ is very small and of no practical conse- 
quence. Also, the differences between 1 — (1 — 0”) and kor’ when testing at a’ = .01 
are also small for up to about 30 tests. For more than about 30 tests 1 — (1— a)‘ pro- 
vides a tighter bound and should be used. When testing at the о = .05 level, ka’ is 
okay up for about 10 tests, but beyond that 1 — (1 — 0”) is much tighter and should 
be used. 


Example 1 


We have a 4 group problem with 3 planned comparisons and want overall о < .10. 
This can be achieved by simply dividing the overall о by the number of tests done, 
i.e., .10/3 = .033. Thus, if each comparison is tested at the о = .033 level, we are as- 
sured by the Bonferroni inequality that 


overall a .033 + .033 + .033 = .10 


Example 2 


Suppose there аге 5 planned comparisons in a 6 group problem. If we test the first 
two at the .05 level (i.e., од = .05 and оо = .05), and the remaining three compari- 
sons at the .01 level (05 = 04 = 05 = .01), then we are assured by the inequality that 


overall о € .05 + .05 + .01 + .01 + .01 = 13 
Now let us consider а couple of research examples of setting up planned com- 


parisons. The next sample has treatments of a structure that could be useful in a va- 
riety of fields. 
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Example 3 


Consider a four group situation involving a comparison of two treatments, a com- 
bination of the two tretments, and a control group on some dependent measure. 
Schematically, we have 


T; (control) Т; Тз T4 (Тә and Тз combined) 


Hi H2 из u4 


The two treatments might be two reading methods, two types of counseling, 
two diets., etc. Of course, the two treatments would have to be such that combining 
them made sense. Now there are three very meaningful, focused questions to ask of 
the data. 


1. Is something better than nothing? Here we are comparing the control group 
vs. the treatment groups. 

2. Do the two individual treatments differ in effectiveness? 

3. Is the combination of treatments more effective than either treatment indi- 
vidually? 


These comparisons will be set up as contrasts among the population means for 
the groups. In general, for k groups with population means ці, Jo, . . ., Uk, a contrast 
among the population means is given by 


L= си + cole +... + ск 


where the sum of the coefficients (су) must equal 0. 
Let us set up the comparisons for the 3 questions above, and we will see that 
each is a contrast: 


Li = Hı — (U2 + U3 + M43 


The coefficients here аге сі = 1, c? = сз = са = -1/3. The sum of these coeffi- 
cients = 0, so L4, is a contrast. 


То = o - Us 
The coefficients are c2 = 1 and сз = —1, so that c2 + сз = 0 and 15 is a contrast. 
Із = u4 – (иә + Из)/2 


Here c4 = 1, со = сз = – 5, so that Èc; = 0, and L4 is a contrast. 
The formula for the sum of squares of a contrast is given by 


88; =D 1) с? In; 
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Where Ê is the estimate of the contrast and the ni, are the group sizes. 

For equal group size the above set of three contrasts represent orthogonal com- 
parisons. The sums of squares associated with the contrasts (denote them by 5,7) 
are independent, and it can be shown that: 


SS, = 551, + 555 + 5973 (19) 


That is, the overall between groups variation is additively partitioned into three 
independent pieces of variation. For equal group size the condition that needs to be 
met for a pair of contrasts to be independent is that the sum of the products of the 
coefficients equal 0. We now show that this condition is met for all three pairs of 
contrasts. We present the contrasts below in schematic form, just using the coeffi- 
cients that define each contrast: 


Ti Т; Т; Ті 
Ід 1 -1/3 -1/3 -1/3 
Lz 0 1 -1 
La 0 -1/2 -1/2 1 


The sum of the products of the coefficients for each pair are: 


Іл and Lo: 1(0)  (-1/3)(1) + (-1/3)(-1) + (-1/3(0) = 0 
Lı and La: 1(0) + (-1/3)(-1/2) + (-1/3)(-1/2) + (-1/3)(1) = 0 
1» and Із: 0(0) + 161/2) + (-1)(-1/2) + 0(1) = 0 


There are other sets of three orthogonal comparisons for this problem, and any 
such set will also provide for an additive partitioning of the SS». The nice feature 
about orthogonal contrasts is that significance on one contrast implies nothing 
about potential significance on another contrast. That is, we do not have a con- 
founding of the sources of variation. With correlated contrasts the sources of varia- 
tion are confounded; however, the unique sum of squares associated with each 
contrast can be obtained by using the SPSS MANOVA program, which since Re- 
lease 2.2 has the unique sum of squares as the default option. Although it is desir- 
able to have orthogonal contrasts, the set of contrasts to impose in a given situa- 
tion should be dictated by the research questions of the investigator. 

Now we express the condition for independence of two contrasts for equal 
group size in general form. Consider two general contrasts for k groups: 


Ly = cuu + соу +... + cid 


То = сода + соо +... + cog 


The condition for independence is 
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C11C21 t С12022 +... + C1kC2k = 0 


If the group sizes are not equal, then the condition for independence is more 
complicated and becomes: 


(ci1€21)/n1 + (C12C22)/N2 +... + (Cigcox)/ny = 0 


Example 4 


A medical researcher wishes to evaluate the effectiveness of 4 drugs on reaction 
time. Schematically, the design is 


One Generic Type Other Generic Type 
Control Drug A Drug B Drug C Drug D 
Mi H2 Из H4 us 


A set of 4 focused and relevant questions to ask here are: 


1. Are drugs more effective than no drugs? 
Ly = pi — (Uo + ua + щ + 5/4 
2. Is one generic type of drug more effective than the other generic type? 
L2 = (Џо + 13)/2 — (H4 + 15)/2 
3. Are the two drugs of the first generic type different in effectiveness? 
Та = ш – Us 
4. Are the two drugs of the other generic type different in effectiveness? 
Та = u4 – Us 


The reader should verify that each of these comparisons is indeed a contrast. 


2.16 TEST STATISTIC FOR PLANNED COMPARISONS 
Recall that for k groups with population means ці, U2, . . ., ць, a contrast (L) among 
the population means is given by 
L= сш + cole +... + ск 


where Xc;- 0. 
This contrast is estimated by replacing the population means by the sample 
means, yielding 
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E = сая tom ++ срж 
To test whether a given contrast is significantly different from 0, i.e., to test 
Ho: 1 = 0 vs. H: [= 0 


we need an expression for the variance of a contrast. We show in the Appendix at 
the end of this chapter that the variance for a contrast is given by 


A 


2 
6; = МУ (У c? Im) 
where М5у is the error term from all the groups (the denominator of the F test) and 
the n;, are the group sizes. 

Therefore, the following F statistic can be used to test a contrast for signifi- 
cance, 


А DIY dn 
Е = 12 KMS, 2 п) = == 20 
(MS, OS |с? Ini) M (20) 


with 1 and (N—k) degrees of freedom. Note here that each contrast has just one de- 
gree of freedom. As Hays and others have indicated, the (k—1) between group de- 
grees of freedom can be partitioned into (k—1) non-redundant single degree of 
freedom contrasts. 

Note that if the group sizes are equal (n; = m = = ng =n), then Equation 20 can 
be written in somewhat simpler form as: 


_ nb? 1X0? 
My 


Also, some authors present the test statistic for planned comparisons as а 1 sta- 
tistic: 


A 


НЕ (21) 


MS, Zc? Ini 


Since it can be shown that F = 2, the F test is equivalent to a two tailed t test at 
the same level of significance. SPSS, as shown in Table 2.10, uses the / statistic for 
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contrasts. The probabilities given are for a two tailed test, so that if you are testing a 
directional hypothesis with your contrast the probability value should be divided 
by two. 


Numerical Example 


Suppose an investigator has a 4 group problem and wishes to examine the follow- 
ing planned comparisons: 


Groups 
1 2 3 4 (in population means) 
Li 1 -5 -5 0 Ly = Wi (Me + uay2 
12 3 5 -5 -5 Іо= (ш + 10)/2 – (из + 4/2 
Із 0 0 1 -І L3 = |з-ІмМ 


Notice that on the left the contrasts are indicated schematically by simply using 
the coefficients for the population means. This is the way contrasts are input for 
SPSS, and SAS. 

Suppose we have the following descriptive information for the groups: 


1 2 3 4 
ni 10 8 11 13 
Xi 5.6 7.3 8.1 4.2 


and it is known that the pooled error term for the groups is М8, = 8.7. 

We now show how to calculate the test statistic given in Equation 20 for testing 
contrasts 1 and 3, and leave the calculation of contrast 2 as an exercise for the 
reader. 


Contrast 1 
First we obtain the estimate of the contrast using the sample means: 
Ї =5.6—(7.3+8.1)/2 =5.6–77 = —2.1 
Also, we have 
уус} in = 12 110 +(—.5)2 /8 +(—.5)? /11—.154 


(—2.1)2 /.154 
8.7 
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The critical value at .05 for this contrast 15 F 05,1 з8 = 4.08. Therefore, this con- 
trast would not be significant. Note that from the above critical value the between 
degrees of freedom for the contrast is 1. 


Contrast 3 
The estimate of the contrast is given by 
Їз =8.1-4.2=3.9 
Also, we have 
dn = 12 111+(—1)2 /13 = .168 
Thus, 


Е (3.9)2 /.168 
8.7 


Е = 10.406 


The critical value for this contrast at the .05 level is the same as for contrast 1, 
i.e., 4.08. Thus, contrast 3 is significant. 


2.17 PLANNED COMPARISONS ON SPSS AND SAS 


To illustrate how to set up planned comparisons on the statistical packages, and 
how to interpret the output, we ran the data for Example 4 (Section 2.15) involving 
the effect of different generic type drugs on reaction time. The complete SAS con- 
trol lines are given in Table 2.8. To obtain the planned comparisons on SPSS we 
used the Windows 12.0 version. 

To obtain planned comparisons on SPSS with windows is a simple matter. One 
first goes to and clicks on STATISTICS, then on COMPARE MEANS, and finally 
click on ONE WAY ANOVA. When all of this is done the first screen displayed in 
Table 2.9 appears. Make REACTIME the dependent variable and DRUG the factor 
(grouping variable), and then click on CONTRASTS. When this is done the mid- 
dle screen in Table 2.9 appears. Recall that for this example we have 5 groups, so 
we can have at most 4 contrasts. enter a coefficient for each group (category) of the 
factor variable and click ADD after each entry. Each new value is added to the bot- 
tom of the coefficient list. To specify additional contrasts, click NEXT. For the 
current example, when the last contrast is entered the bottom screen appears. Click 
on CONTINUE and then click on OK to run the contrasts. 

Selected annotated printout from the SPSS and SAS runs is presented in Table 
2.10. 
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TABLE 2.8 
SAS Control Lines for Planned Comparisons on Drug Data* 


DATA CONTRAST; 
INPUT DRUG REACTIME @@; 
IM 


9 181 11 11 1959 
9 


18 2 5 2 122 11 2 12 
19 

3103437 3 23 
43 
241049 4 13 4114 9 
13 4 9 
7 5b 115 125 95 14 5 16 

24 5 19 

PROC PRINT; 

PROC MEANS; 

BY DRUG; 

PROC GLM; 

CLASS DRUG; 

MODEL REACTIME - DRUG; 

CONTRAST 'DRUG VS NO DRUG' DRUG 1 -.25 -.25 -.25 -.25; 
CONTRAST “СЕМТҮРЕ1 VS GENTYPE2' DRUG 0 .5 .5 -.5 -.5; 
CONTRAST ‘DRUG A VS DRUG В” DRUG 0 1 -1 0 0; 

CONTRAST ‘DRUG C VS DRUG D’ DRUG 0 00 1 -1; 


C9 CJ Юю М 
- 


E 
1 
1 
2 
2 
31 
3 
4 
4 
5 
5 


* Recall that the design was 


One Generic Type Other Generic Type 
Control Drug A Drug B Drug C Drug D 
Шш н? Из H4 Hs 


2.18 THE EFFECT OF AN OUTLIER ON AN ANOVA 


In Chapter | we indicated the importance of outliers and showed that they can have 
a dramatic effect on the results of a statistical analysis. Here we illustrate that effect 
for a one way analysis of variance. 


Example 


An investigator has collected the following data: 
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Gp 1 Gp 2 Gp 3 
15 17 6 
18 22 9 
12 15 12 
12 12 11 
9 20 11 
10 14 8 
12 15 13 
20 20 

21 7 


The score of 30 in group 3 is an outlier. With that case in the ANOVA we do not 
find significance (F = 2.61, p < .095) at the .05 level, while with the case deleted 
we do find significance well beyond the .01 level (F = 11.18, p < .0004). Deleting 
the case has the effect of producing greater separation among the means, since the 
means with the case included are (13.5, 17.33, 11.89), while the means with the 
case deleted are (13.5, 17.33, 9.63). It also has the effect of reducing the within 
group variability in group 3 substantially, and hence the pooled within group vari- 
ability (the error term for ANOVA) will be much smaller. 


2.19 MULTIVARIATE ANALYSIS OF VARIANCE 


In this chapter we have considered what is called univariate analysis of variance, 
since there is just one dependent variable in the analysis. In many studies, however, 
the subjects are measured on several variables. The appropriate statistical analysis 
for comparing the k groups on the p dependent variables simultaneously is called 
multivariate analysis of variance (MANOVA). This type of analysis is to be distin- 
guished from doing a separate univariate ANOVA on each dependent variable. 
Four reasons why a MANOVA is preferable to such separate univariate analyses 
are: 

1. The univariate analyses, especially for a moderate or large number of de- 
pendent variables, allow the overall type I error rate to go completely out of con- 
trol. The situation here is analogous to what happened when doing several 7 tests 
for the k group problem. 

2. The univariate ANOVAs ignore important information, such as the correla- 
tions among the dependent measures, whereas the multivariate tests incorporate 
these correlations into the test statistics. 

3. The univariate tests may not show the groups to be significantly different on 
any of the variables, because of small unreliable differences on each of the vari- 
ables. However, if measures are considered jointly (as in MANOVA) there may be 


TABLE 2.9 
SPSS for Windows 12.0 Screens for Obtaining Planned Comparisons 


Insert art from p. 108 of previous edition 
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TABLE 2.10 
Selected Printout From SPSS and SAS for Planned Comparisons 
on Drug Data Example 


@As indicated in 2.16, one can use a two tailed f test or an F test to check the contrast for signifi- 
cance. SPSS chooses to use the 1 test whereas SAS reports the Е The two are equivalent, as ап examina- 
tion of the tail probabilities shows. 

It can be shown that 12 = Е and note here that (-2.766)? = 7.65, (.719)? = .52, etc. 

Suppose a priori we wished overall œ € .05 for the set of 4 statistical tests (contrasts). By doing each 
test at the .05/4 level of significance we are assured by the Bonferroni inequality that 

overall а € .0125 + .0125 + .0125 + .0125 
@Using а = .0125, we find by examining the tail probabilities that only contrast 1 is significant. 


@Since the contrasts are orthogonal, the sums of squares (184.9 + 12.5 + 36 + 81) add up to be- 
tween sum of squares, i.e., 314.4. 
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a significant difference. That is, small differences on each of the variables may 
combine to produce a reliable overall difference. 

4. If treatments affect the dependent variables in different ways, and the de- 
pendent variables are at least moderately correlated within groups, the multivariate 
approach will be quite powerful and can detect differences that the univariate tests 
cannot. One of the exercises illustrates this situation. 


2.20 SUMMARY 


1. The analysis of variance (ANOVA) is appropriate for comparing k independ- 
ent groups on a single dependent variable. It is the generalization of the t test, 
which is used to compare two groups. 

2. It was shown how the use of multiple 1 tests for the k group problem allows 
the overall о level to get out of control, hence the need for ANOVA. 

3. In testing the null hypothesis of equal population means, the ANOVA com- 
putes and compares two basic sources of variation (between and within). Between 
group variability measures variability of the group mean about the grand mean, 
while within group variability measures how much subjects vary who are treated 
alike. 

4. It was shown that MS,, and MS, both represent variances. 

5. Analysis of variance rests on three assumptions: normality of scores in each 
group, equal population variances, and independence of the observations. Consid- 
erable research on violations of assumptions suggests that ANOVA is robust with 
respect to a violation of the normality assumption. It is robust against unequal vari- 
ances provided that the group sizes are equal or approximately equal (larg- 
est/smallest « 1.5). ANOVA is severely affected by correlated observations. Two 
methods are suggested for dealing with correlated observations. One is to simply 
test at a more stringent a level. The other, if dealing with several small groups 
within each treatment, is to use the group mean as the unit of analysis. 

6. After a significant F several post hoc procedures are mentioned for locating 
where the differences lie. For most situations the Tukey procedure is preferred be- 
cause of 3 reasons: (a) it does control the overall о, (b) it is fairly powerful for de- 
tecting differences, and (c) it examines a meaningful and easily interpreted set of 
comparisons (pairwise comparisons). For more extensive data-snooping, the 
Scheffé procedure should be used. This procedure is quite conservative, however, 
so for more adequate power one will either need to have a large number of subjects 
or set overall о at .10 or .15. The Dunnett procedure should be used if you are only 
interested in comparing each treatment group against the control group. 

7. For situations where the homogeneity of variance assumption is not tenable 
and there are sharply unequal group sizes, an ANOVA and post hoc procedure that 
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do not assume equal variances should be used. In Section 2.13 we illustrated two 
post hoc procedures (Tamhane and Games-Howell), which do not assume equal 
variances. 

8. Confidence intervals and measures of associations (variance accounted for) 
are mentioned as two ways of determining the practical significance of a study. 
Several cautions to be observed in using measures of association are mentioned, 
and an example is given to illustrate that a “small” amount of variance accounted 
for could indeed be practically significant. 

9. Planned comparisons are presented as an alternative way to analyze the k 
group problem. Here the researcher a priori is setting up comparisons among the 
group means corresponding to his or her hypotheses. An overall F test is not re- 
quired in this approach. Planned comparisons are a more powerful approach, and 
overall о can be controlled through use of the Bonferroni Inequality. 


EXERCISES 


1. Бога 7 group problem there would be 21 ¢ tests for the 21 paired compari- 
sons. Using this approach, rather than the proper one way ANOVA, what 
would overall о; be approximately (if each 1 test is done at .05 level)? 


2. (a) Ifkz4and nı = m = пз = па = 20, then df, = ?, dfw =? 
(b Ifk= 3 andn = 12, m = 25, and пз = 20, then df, = ? dfw = 7 


3. (a) Find the critical value at the .05 level for a 3 group problem with 10 
subjects per group. 
(b) Find the critical value at the .10 level for a 4 group problem with 20 
subjects per group. 
(c) Find the critical value at the .01 level for a 5 group problem with 8 
subjects per group. 


4. (a) Do aone way analysis of variance on the following data, testing for 
significance at the .10 level. 


Treat 1 Treat 2 Treat 3 
11 10 15 
14 8 14 
13 12 10 
17 7 11 
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(b) Obtain М5, by computing the variances for each group using your 
TI-30Xa STAT calculator, or some other calculator that yields means and 
variances. 


. Doaone way ANOVA on this data; test for significance at .05: 


Group 1 Group 2 Group 3 Group 4 
2 7 4 8 
3 9 4 + 
5 11 5 7 
6 8 7 
3 


In obtaining MS,, with your calculator, note that 


SSw = (m —1)52 + (ro — Ds? +--+ (пе — Ds? 


where 52,52 ,...,s2 are sample variances for groups. 


. As part of a study by Sarachen-Deily (1985), deaf high school students 


were classified as good readers, average readers or poor readers on the ba- 
sis of scores on the Stanford Achievement Test, Special Edition for Hear- 
ing Impaired Students. The following table resulted: 


Category n Mean Stand. Deviation 
Good reader 7 6.90 49 
Aver. reader 9 5.04 56 
Poor reader 4 3.38 10 


An ANOVA on this data yielded a significant overall difference at the .01 
level. 

(a) Are the sample variances for the groups sharply unequal? 

(b) Would you be worried about the significant ANOVA result being spu- 
rious? Why, or why not? 


. A4-group ANOVA is run on the following data: 


Gpl Gp2 Gp3 Gp4 
ni 15 15 15 15 
x 5.6 7.3 4.1 8.7 


The F statistic is F = 63.73/22.35 = 2.85, p < .10. 


10. 
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Apply the Tukey procedure at the .10 level to determine which pairs of 
groups are significantly different. 


. Astudy by Smith, Jones, and Waugh (1986) evaluated the effect of interac- 


tive computer assisted videodisc laboratory simulations in enhancing 
achievement in a freshmen college chemistry course. A group of 103 stu- 
dents were randomly assigned to one of three groups: (a) the first group 
was required to complete a series of interactive videodisc lessons on chem- 
ical equilibrium in place of laboratory work (the VDISC group), (b) the 
second group was required to complete only a traditional laboratory exper- 
iment on the same content material (the LAB group) and (3) the third group 
was required to complete the interactive videodisc lessons before the tradi- 
tional laboratory experiment (VDISC+LAB group). Following these treat- 
ments all students took a seven item multiple choice quiz that required 
them to apply knowledge of chemical equilibrium to solve both familiar 
and unfamiliar systems, with the following results: 


VDISC VDISC+LAB LAB 
n 21 17 49 
x 5.810 5.588 4.163 
sd 1.167 1.121 1.519 


A one way ANOVA on these data yielded F = 13.84, p < .00005. 

(a) The group sizes are sharply unequal. Would you be worried about the 
homogeneity of variance assumption? Why, or why not? 

(b) Apply the Tukey procedure with overall о = .05 to determine which 
pairs of groups differ significantly. 

(c) Does there appear to be practical significance? In answering this, cal- 
culate the effect size(s) for the pair(s) that are significantly different. 


. One of the planned comparisons involved in the numerical example in 2.16 


was 
Ly = (ш + џзу/2 – (из + u4y2 
Test this contrast for significance at the .05 level. 


For example 3 in Section 2.15 the following three contrasts were defined: 
Li = qui – (uo + (из + U4)/3, Lo = W2 — us, and Га = pa — (uo + (из)/2. Label the 
treatment variable TREAT and the dependent variable DEP. Show the com- 
plete control lines for running the above three contrasts on SPSS and on 
SAS. For the data lines element just put DATA. 
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O’Grady in his paper on measures of association, states 


For instance, examining the relationship between the five different types of fit- 
ness programs and subsequent efficiency of the heart would, I suspect, produce 
quite different measures of explained variance for a population of runners in 
comparison with a population of sedentary individuals (which, in turn, would 
no doubt produce a quite different value than would be found for a random sam- 
ple of the American adult population). (p. 773) 


What does this relate to that was discussed in the chapter on measures of 
association? 


Consider the following data for three groups of subjects on two dependent 
variables y; and y»: 


Group 1 Group 2 Group 3 
yı y2 yı y2 yı y2 
3 7 4 5 5 5 
4 7 4 6 6 5 
5 8 5 7 6 6 
5 9 6 7 7 7 
6 10 6 8 7 8 


(a) Run a опе way ANOVA for y, and for y» on SAS and а one way 
multivariate analysis of variance on SAS. All three analyses are obtained in 
one run. Simply put this in the MODEL statement: 


MODEL YI Y2 = GPID; 


(b) Is there a significant difference for уџ, at the .05 level? for y» at the .05 
level? 

(c) Are the multivariate tests significant at the .05 level? 

(d) In discussing the results from (b) and (c), first look at the pattern of 
means for уі, and y2 over the three groups. Are the patterns different? An- 
other factor that is important is the within group correlation for у; and y2, as 
this has a strong influence on the magnitude of the error term for the 
multivariate tests (see Stevens, 1986, Chapter 4). 


An auditor from the Internal Revenue Service wishes to compare the effi- 
ciency of four regional tax processing centers. A random sample of 10 re- 
turns is selected from each center, and the number of days between receipt 


14. 


15. 
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of the tax returns and final processing is determined. The results (in days) 
are as follows: 


East Midwest South West 
49 47 39 52 
54 56 55 42 
40 40 48 57 
60 51 43 46 
43 55 50 50 
65 36 63 34 
59 38 48 40 
70 52 57 51 
61 41 49 39 
48 43 65 36 


(a) Run this data on SAS to determine whether there is difference in aver- 
age processing time among the 4 centers at the .05 level. 

(b) Are there any significant pairwise differences at the .05 level with the 
Scheffe procedure? With the Tukey procedure? 

(c) Explain the difference between the results found in (b) for the two 
parts of the problem. 


An investigator randomly assigns 20 subjects to each of four groups (two 
control and two treatment) and is interested in the effect of treatments on 
sociability. She wishes to determine whether treatments differ from no 
treatment, whether the treatment groups do better than the Hawthorne con- 
trol group, and whether there is a difference in the efficacy of the treat- 
ments. Schematically then we have the following contrasts: 


Control Hawthrone Control Treat 1 Treat 2 
Li 1 1 -1 -1 
12 0 1 -.5 -.5 
Ід 0 0 1 -1 


Is this a set of orthogonal contrasts? 


Levin, McCormick, Miller, Berry, and Presley (1982) had fourth grade stu- 
dents learn a list of relatively complex English vocabulary words in two ex- 
periments. In Experiment 1, pupils used either a mnemonic (keyword) con- 
textual or a verbal contextual procedure. In Experiment 2, three other 
conditions were compared to the keyword context condition. In Experi- 
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ment 2 the 64 fourth graders were randomly assigned, 16 to each group, 
with the following summary statistics resulting: 


Keyword Experiential Picture 
Context Context Context Control 
Mean 723 36.2 42.4 48.7 
Stand. Dev. 22.9 27.0 23.1 25.6 


16. 


Levin et al. state in their Results section, 


Performance differences among conditions were assessed in terms of five 
planned non-orthogonal comparisons, each based on о= .01Statistical analysis 
revealed that students in the keyword context condition substantially outper- 
formed those in the control condition, t = 2.71, p « .01, and the picture context 
condition, t = 3.42, p « .005, as well as those in the experiential context condi- 
tion, t= 4.14, p < .001. Neither picture context nor experiential context differed 
significantly from controls, (p. 130) 


(a) Whatare the 5 planned comparisons? 
(b) Show that they are non-orthogonal. 
(c) Show how the three ¢ values indicated above are obtained. 


As mentioned in Section 2.10, if your interest in a study is confined to test- 
ing each of several treatment groups against a control group, then the 
Dunnett procedure is the most powerful. Let Қа) represent the modified г 
value from Dunnett's table (Appendix B.3), n the number of subjects in 
each group, апа MS,, the error term. Then the critical value that must be ex- 
ceeded for a difference to be significant at some alpha level is given by 
t(d)42(MS „) / n. If the group sizes аге not equal and the homogeneity of 
variance assumption is tenable, the use of the harmonic mean for each pair 
of groups is suggested. 

Suppose a study has been run with 17 subjects per group. The error term 
is 67.24, and the means are as follows: 


Control Treat. 1 Treat. 2 Treat. 3 


26.1 29.3 342 31.6 


Use Dunnett's procedure at the .05 level to determine which of the treat- 
ment means is significantly different from the control group mean. 


17. 
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The following is an approximate 40% random sample of the CLINICAL 
data (in Appendix A in back of book), where we present data on only the 
FREEDIST variable: 


GROUP 1 GROUP 2 GROUP 3 
9.00 11.00 5.33 
9.00 7.67 14.00 
7.67 12.00 10.00 
8.67 9.00 8.67 
14.67 7.33 9.33 
6.33 7.33 9.00 
7.67 7.33 9.00 
7.00 8.33 9.67 
7.33 5.33 16.33 
8.67 11.00 10.33 
8.67 7.67 9.67 
7.67 12.00 10.33 
8.00 6.00 11.33 
12.00 9.67 
7.00 9.33 
5.33 10.00 
8.00 9.67 
9.33 
5.00 
8.33 
9.00 


(а) Doaone way ANOVA on this data using either SAS or SPSS. What is 
the null hypothesis? Do you reject at the .10 level? 

(b) Since the group sizes are sharply unequal, test the assumption of 
equal population variances with the Levene test. Is it significant at the .05 
level? 

(c) Apply the Tukey procedure at the .10 level. Which pairs of groups are 
significantly different? 


On the following page is a sampling of 60 (all of site |—three to five year 
old disadvantaged children from inner city areas in various parts of the 
country) from the SESAME STREET data base. The grouping variable is 
VIEWCAT (coded as 1 if the children rarely watched the show to 4 if the 
children watched the show on average of more than 5 times a week). The 
dependent variable is POSTLET-PRELET, that is, a measure of how much 
the children have gained in their knowledge about letters. 
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VIEWCATI VIEWCAT2 | VIEWCAT3 VIEWCAT4 


7 -1 7 6 
3 4 7 6 
8 17 28 4 
0 6 16 + 
4 9 32 24 
-1 6 8 27 
2 4 -1 21 
-1 10 10 -15 
-1 9 26 4 
11 —22 24 
1 11 35 
-2 32 5 
6 10 8 
-10 33 7 
21 14 
5 33 
7 30 
31 
14 
5 


(а) Боаопе way ANOVA on this data at the .05 level of significance us- 
ing either SAS or SPSS. What is the null hypothesis? Do you reject it? 
(b) Since the group sizes are sharply unequal, test the assumption of 
equal population variances with the Levene test. Is it significant at the .05 
level? Should we be concerned about the result in (a) being spurious? Ex- 
plain. 

(c) Apply the Tukey procedure at the .05 level. Which pairs of groups are 
significantly different? 


It was mentioned in the chapter that 1 – (1 — 0 is approximately equal to 
ko’ for small о“. 

(a) Letk=3. Expand 1 – (1 – œ)’, and show that it equals 3 o — 3 02 + 
o, 

(b) Let o' = .01. Calculate ko’ and 3 о — 3 02 + a3. What have we 
shown? 


Run the CLINICAL data set. 
Is there a need to use the Levene test? Explain. 
Are the groups significantly different at the .05 level? 


21. 
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Run the ALCOHOL data set. 

Is there a need to use the Levene test? 

Are the groups significantly different at the .05 level? 

Did anything interesting happen in the Tukey post hoc procedure? 


Why are correlated observations a real problem in social science research? 
In answering this question, deal with two issues: 

(a) How often do correlated observations occur? 

(b) Examine Table 2.1 


APPENDIX 


Theorem. The estimated variance for a contrast L is given by 


6; = MS, (св Ini). 


where the n;, are the groups sizes for the k groups. 
Proof. 'The estimated contrast is 


^ 


L = ci + Сох ++ €kXx 


Now we take the variance for both sides 


var(L) = var(ciXi + c2Xo + + СЕ) 


Since the means аге sampled from independent groups, the random variables 
с\ху,...,скХк are uncorrelated. But for uncorrelated variables the variance of a 
sum is equal to the sum of the variances (since all the covariance terms are 0), 
Thus, 


var(L) = var(c,X, ) + var(c2X2) + +++ + var(cy xy ) 


Now recall from introductory statistics that if c is a constant, then the variance 
of cx is var(cx) = c?var(x). 
Using this result the above may be rewritten as 


var(L) = c? var(Xi) + c2 магбо ) +--+ ср уаг(хф) 
Now, it is well known that the variance of a sample mean based on а sample of 
size n is given by var(x) = 62/п, where 62 is the variance of the population. (See 


Glass & Hopkins, 1984, pp. 188-190, for a proof.) Applying this result to x1, x2, 
etc., we obtain: 


var(L) = со? / m -- c0? /т +--+ + c20? [n 
In obtaining the above we assumed the population variance was the same in 


each of the k groups, which is the homogeneity of variance assumption for 
ANOVA. Now, factoring out the common term о? we have 
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var(L) = 62 (с2 /m +c / п + + с пе) 


In practice о? has to be estimated, and recall from the chapter that М8, = 62. 
Thus, replacing 62 by MS,, and writing the sum of the terms in parentheses using 
the summation operator, we obtain the result stated in the theorem: 


var(L) = м8, У „ср / nj 


Power Analysis 
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3.1 INTRODUCTION 


Recall from Chapter 2 that type I error, or the level of significance (0), is the proba- 
bility of rejecting the null hypothesis when it is true, in effect saying the groups dif- 
fer when they do not. The о level set by the experimenter is a subjective decision, 
but it is usually set at .05 or .01 by most researchers. The reason for setting 0 so 
low, of course, is to minimize the probability of making this error. In statistical in- 
ference we can never be sure we have made the correct decision; however, by set- 
ting & very low we can control quite effectively the risk of this type of error occur- 
ring. As we shall see shortly, though, it is not always wise to set о as low as .05 or 
:01, especially when the group sizes are less than 20. 

There is another type of error that one can make in conducting a statistical test, 
and this is called a type II error. Type II error (denoted by (D) is the probability of 
accepting Ho when it is false, 1.е., saying the groups don't differ when they do. 
Note that only one of these errors can occur in a given study. Either we falsely re- 
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ject Ho or we falsely accept Ho. Now, not only can either of these errors occur, but 
in addition they are inversely related. That is, as we control on type I error, type П 
error increases. We illustrate the below for a two group problem with 15 subjects 
per group and a difference between the means of one half a standard deviation. No- 
tice that as we control оп о more severely (from .10 to .01), type II error increases 
fairly sharply (from .37 to .78). 


a В 1-p 
.10 37 .63 
05 52 48 
01 78 322. 


The quantity in the last column is the power of a statistical test, and is the proba- 
bility of rejecting Ho when it is false. Thus, power is the probability of making a 
correct decision. In the above example, if we are willing to take a 10% chance of 
rejecting Но falsely, then we have a 63% chance of finding a difference of a speci- 
fied magnitude in the population (more specifics on this later in the chapter). On 
the other hand, if we insist on only a 1% chance of rejecting Ho falsely, then there 
are only about 2 chances out of 10 of finding the difference (1.е., power = .22). This 
example with small sample size suggests that in this case it might be prudent to 
abandon the traditional levels of .01 or .05 (and especially .01) to a more liberal а, 
level to improve power sharply. Of course, one does not get something for nothing. 
We are taking a greater risk of rejecting falsely, but that increased risk is more than 
balanced by the increase in power. 

Cast in a broad context, power is dependent оп many factors: (1) о level, (2) 
sample size, (3) effect size, (4) the statistical test used, and (5) the research design. 
For example, the ¢ test for dependent samples is more powerful than the t test for in- 
dependent samples, and a repeated measures design (discussed in Chapter 5) is 
more powerful than a one way ANOVA design. However, the power of a specific 
statistical test is dependent on these 3 factors: 


1. The a level set by the experimenter. 

2. Sample size. 

3. Effect size—How much of a difference the treatments make, or the extent 
to which the groups differ in the population on the dependent variable. 


We have already indicated that power may be increased substantially by adopt- 
ing a somewhat more liberal о; level, say .10 or .15. There аге limits on this, how- 
ever; no one would take о; = .40 to gain still greater power, for this is taking far too 
great a risk of rejecting Ho falsely. 
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Power is heavily dependent on sample size. Consider a two tailed test at the .05 
level for the t test for independent samples. Suppose we have an effect size of .5 
standard deviations in the population, 1.е., the difference in the means divided by 
the standard deviation is .5. The table below shows how power changes dramati- 
cally as sample size increases from small (10) to large (100): 


n (subjects per group) power 
10 18 
20 33 
50 70 
100 .94 


With only 10 subjects per group we have only about a 20% chance of detecting 
this effect size, whereas with 100 subjects per group we are almost certain of de- 
tecting the effect (i.e., rejecting the null hypothesis). 

As the above example suggests, when sample size is large (say more than 100 
subjects per group) power will generally not be an issue. In these cases power will 
tend to be adequate (> .70) to excellent (> .90). It is when one is conducting a study 
where the group sizes are small (n « 20), or when one is evaluating a study that had 
small group size, that it is imperative to be very sensitive to the possibility of a type 
II error. 

The third factor that affects power is effect size. If the effect size is small or me- 
dium, then we will see shortly that large group size is needed to detect these ef- 
fects, i.e., to have adequate power. On the other hand, if the effect size is large 
(about one standard deviation or greater), then only about 15 subjects per group 
will be needed for adequate power. 

Most statistics books do not do a good job of discussing the consequences of 
making a type I or a type II error. Let me attempt to remedy this deplorable situa- 
tion. Suppose you are comparing a treatment group vs. a control group on some de- 
pendent (outcome) variable. Treatment here is generic and could refer to teaching 
method, counseling method, diet, drug, etc. Schematically: 


TREAT CONTROL 
ш H2 
The null and alternative hypotheses are as follows: 
Ho: i= ш Ha: ш Иг 


If atype I error is made (rejecting Ho when it is true) we are saying that the treat- 
ment is effective when in fact it is not. This is false optimism. A school district, for 
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example, may invest in a program heavily. If a statistical test is done and a type I er- 
ror is made, the program is not effective and yet much money has been spent. 

Now consider the other side of the coin, that is, a type П error. A type П error is 
accepting the null hypothesis when it is false. If a type II error is made, we may 
have “the greatest thing since sliced bread" and not know it. This is false negativ- 
ism. In a medical sense a type II error could be dangerous or deadly. It would be 
like telling someone they don't have a disease when in fact they do. In this case, 
someone may die before it is realized that a type II error was made. 


3.2 tTEST FOR INDEPENDENT SAMPLES 


Cohen (1977) has defined the population effect size as 


d — (ui —u2)/6 (D 


where б is the assumed common population standard deviation. This population 
effect size is estimated by 


d = (Xy xs 


where 


62 — (т — 152 + (пә —Ds? ag 
nin, -—2 


is the estimate of the assumed common population variance. 

It is necessary to divide by o in obtaining the effect size measure to adjust for 
scaling differences on variables which can be quite arbitrary in social science re- 
search. Note that Equation 1 expresses the difference between the groups in stan- 
dard deviation units. For example, if the means for the groups were x; 2 10 and x» 
= 4, with the estimated standard deviation s = 15, then d- (10-4)/15 = .4, or the 
groups differ by .4 of a standard deviation. 

An effect size around .20 is considered small, an effect size around .50, me- 
dium, and an effect size >.80 is large. A medium effect size is one that would be ap- 
parent to a researcher. For example, .5 standard deviations is the difference in mean 
LQ. between semiskilled workers and professionals and managers. The difference 
in mean I.Q.s between PhDs and typical college freshmen is an example of a large 
effect size, that 1s, about .8 standard deviations. 

The following power values for the t test (о = .05, two tailed test) from Cohen’s 
text illustrate precisely how poor power is with small group size and/or small effect 
size: 
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1. nı = п = 15, d = .50, power = .26 
2. nj = no = 35, d= .30, power = .23 


Power can be adequate with small group size, but only if the effect size is large. 
For example, with n; = m = 15 and d = 1, then power = .75 at & = .05 for a two 
tailed test. 

Cohen and many others have noted that small and medium effect sizes are very 
common in social science research. Light and Pillimer (1984) in Summing Up 
comment on the fact that most evaluations find small effects in reviews of the liter- 
ature on programs of various types (social, educational, etc.): “Review after review 
confirms it and drives it home. Its importance comes from having managers under- 
stand that they should not expect large, positive findings to emerge routinely from 
a single study of a new program. Indeed any positive findings are good news” (pp. 
153-154). To further document the fact that small and medium effect sizes are 
common, we present in Table 3.1 the effect sizes for three sets of studies in quite 
different areas. Note that there are only 3 large effect sizes out of 40. 

How does one estimate power if the group sizes are unequal? Cohen has sug- 
gested using the harmonic mean. Recall from Chapter 2 that the harmonic mean 
for two groups is given by 2n1n»/(m + по). Thus, if we had group sizes of 10 and 20, 
we compute the harmonic mean as 2(10)(20)/30 = 13.3, and use 13 as the n with 
which to enter Cohen’s power tables. Note that use of the harmonic mean weights 
the estimate of power down toward the smaller group size. The difference between 
the ordinary mean and harmonic mean is relatively small when group sizes are ap- 
proximately equal, but when the group sizes are sharply unequal the difference can 
be considerable, as the following table shows: 


Harmonic 
Group 1 Group 2 Mean mean 
10 15 12.5 12 
10 20 15.0 13.3 
10 30 20.0 15.0 
10 40 25.0 16.0 


Researchers not sufficiently sensitive to the power problem may interpret 
nonsignificant results from studies as demonstrating that “treatments” made no 
difference. In fact, however, it may be that treatments did make a difference, but 
that the researchers had poor power for detecting the difference. The poor power 
may result from small sample size (e.g., < 20 Ss per group) and/or from small ef- 
fect size. The danger of low power studies is that they may stifle or cut off further 


TABLE 3.1 
Effect Sizes for Three Sets of Studies: Teacher Expectancy, Desegregation, 
and Gender Influenceability (Data from Becker, 1987) 


Teacher Expectancy 


Sample Size 
Study Ne* Пс Effect Size 

1 79 339 03 

2 60 189 Л2 

3 72 72 -.14 

4 11 22 1.18 

5 11 22 .26 

6 129 348 -.06 

7 110 636 -.02 

8 26 99 -.32 

9 75 74 27 

10 32 32 80 

11 22 22 54 

12 43 38 18 

13 24 24 -.02 

14 19 32 23 

15 80 79 -.18 

16 72 72 -.06 

17 65 255 30 

18 233 224 07 

19 65 67 -.07 

Desegregation Gender Influenceability 
Study Des** Seg Effect Study M F Effect 

1 27 32 32 1 70 71 -.22 
2 39 36 37 2 60 59 04 
3 28 38 49 Э 118 136 35 
4 29 35 12 4 77 114 -.30 
5 36 35 22 5 32 32 63 
6 42 35 .29 6 10 10 81 
7 25 48 59 7 45 45 39 
8 24 48 -.32 8 30 30 46 
9 38 42 -.20 9 40 40 36 
10 131 78 19 10 61 64 -.06 
11 37 101 24 


*n, and n-—numbers of subjects in experimental and control groups 
**Des—number of desegregated schools, Seg—number of segregated schools 
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research in an area where effects do exist, but perhaps are more subtle (as in per- 
sonality, social, or clinical psychology). 

In introductory statistics courses one vs. two tailed tests, or directional vs. 
non-directional alternative hypotheses, were discussed. It was indicated that one 
should do a one tail test if there is empirical evidence (previous studies) and/or the- 
ory to suggest a difference in a specified direction. The statistical advantage of a 
one tail test over a two tail test is that it is more powerful. In one of the exercises 
you are asked to explain why this is so. To further dramatize the considerable dif- 
ference it can make in power if one adopts a somewhat more liberal о; level and is 
able to do a one tail test, consider the table below: 


Power of t test for independent samples 


(пі =m = 20) 
moderate effect size large effect size 
a level & nature of test (d = .6) (d = .8) 
a= .01, two tail 22 44 
© = .05, one tail 59 180 
a= .10, one tail 72. 89 


At the traditional о level of .01 with a two tail test, power is poor in both cases, 
while at the .10 level, with the added advantage of a one tail test, power is good in 
both cases. 


3.3 A PRIORI AND POST HOC ESTIMATION 
OF POWER 


If aresearcher is going to invest a great amount of time and money in carrying out a 
study, then he or she would certainly want to have а 70 or 80% chance (1.е., power 
= .70 or .80) of finding a difference if one is there. Thus, the a priori estimation of 
power alerts the researcher as to how many subjects per group are needed to have 
adequate power. This is an important part of experimental planning. More on this 
shortly. 

The post hoc estimation of power is important in terms of how one interprets the 
results of completed studies. The following example shows how important an 
awareness of power can be. Cronbach and Snow (1969) had written a report on ap- 
titude-treatment interaction research, not being fully cognizant of the importance 
of power. By the publication of their text, Aptitude and Instructional Methods 
(1977), on the same topic they acknowledged the importance of power, stating in 
the preface, (We) ...became aware of the critical relevance of statistical power, 
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and consequently changed our interpretations of individual studies and sometimes 
of whole bodies of literature.’ Why would they change their interpretation of a 
whole body of literature? Because, prior to being sensitive to power, when they 
found most studies in a given body of literature had nonsignificant results, they 
concluded no effect existed. However, after being sensitized to power they took 
into account the sample sizes in the studies, and also the magnitude of the effect 
sizes. If the sample sizes were small in most of the studies with nonsignificant re- 
sults, then lack of significance is due to poor power. Or, in other words, several low 
power Studies that report nonsignificant results of the same character are evidence 
for an effect. By the same character we mean that the test statistic is “leaning” in 
the same direction in all cases. 

Incidentally, the effect size (d) for the 1 test can be expressed in terms of the 1 
statistic as follows: 


а= +1/m (2) 


where nı and пә аге the respective group sizes. This equation is very helpful in esti- 
mating the power of completed studies, since from Equation 2, d is quickly com- 
puted and then Cohen's power tables are entered. 


3.4 ESTIMATION OF POWER FOR ONE WAY ANALYSIS 
OF VARIANCE 


To define the poplulation effect size for a one way ANOVA we first define a mea- 
sure of variability of the group means about the grand mean which is independent 
of sample size: 


Where x is the grans mean and N is total sample size. Dividing by N makes the 
measure independent of sample size. Notice that the numerator is just 55. To make 
our effect measure scale free, we again divide by 6 as was done for the t test effect 
measure: 


f = бі (До (3) 


This measure represents the standard deviation of the standardized means, 1.е., 
variability of z score group means about the grand mean. Now, it can be shown 
(this is one of the exercises) that the estimated effect size can be expressed in terms 
of the F statistic as follows: 


f= J(kK-DFIN (4) 
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Cohen (1977) characterizes an f around .1 as a small effect size, an f around .25 
as medium, and an f> .4 as a large effect size. The above equation is quite useful 
for post hoc estimation of power, since all one needs is the F statistic from the 
study to obtain the corresponding effect size. With the effect size and the common 
group size (or the average group size if the group sizes are unequal), the power for 
the study is easily determined using Cohen's power tables. We give two examples 
to illustrate. 


Example 1 


A three group study was done by Harrington (1968) on the efficacy of advance or- 
ganizers in mathematics. He had 10 subjects per group and obtained F = 4.38. 
What was his power at о = .05? First, we obtain the estimated effect size using 
Equation 2: 


Ê = Jk -)F/IN= 1/2(4.38)/30 = .54 (large effect size) 


In using Cohen’s tables recall that п is the common number of subjects per 
group. Also, for a one way ANOVA, the и in the tables refers to the between groups 
degrees of freedom, which is (k—1). Thus, here we have и = 2. We find that power 
= .64 at f= .50 and power = .81 at f= .60 (cf. Table C.2). Therefore, by interpola- 
tion, power is estimated as .73, which is adequate. Incidentally, Harrington’s F was 
significant at the .05 level. Note that Harrington had adequate power to reject Ho in 
spite of his small sample size because his treatment effect was very large. 


Example 2 


Consider a four group study with n; = 15, m = 13, пз = 20, and n4 = 17. The investi- 
gator obtained F = 1.4. What was her power at а = .05? at a = .10? 
First we obtain the effect size: 


Ô = /2(1.4)/65 = .254 (medium effect size) 


Next, the average group size is 16.25 (we use 16), and и = К—1 = 3. Therefore, 
ага = .05, power = .34 (Table С.З), whereas at œ = .10, power = .48 (Table C.7). In 
both cases the power is inadequate. 

If a post hoc power analysis is done on a study where significance is not found 
and the effect size is quite small (< .10), then one must decide whether such an ef- 
fect has any practical significance. On the other hand, when significance is not 
found and a post hoc power analysis reveals a large or medium effect size, then it is 
essential to replicate the study with more adequate sample size. 
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3.5 APRIORI ESTIMATION OF SUBJECTS NEEDED 
FOR A GIVEN POWER 


Here we need an expected effect size in order to enter the power tables to determine 
how many subjects per group are necessary for a specified power at some © level. 
One could use the average of the estimated effect sizes from studies similar to 
yours; that is, similar in nature of treatments, duration of treatments, type of sub- 
jects, dependent variable used, instrument used to measure the dependent variable, 
etc. When a study is similar in enough of the above respects to qualify as an estima- 
tor of your expected effect size is of course a subjective judgement. But even if an 
estimate is fairly rough, as long as we can obtain at least two such estimates, the av- 
erage of these will probably be reasonably accurate. Furthermore, it is surely better 
to have some estimate and hence be able to determine approximately how many 
subjects are needed, rather than to have no idea at all. 


Example 3 


Suppose investigator X has found two studies similar to his. 


Study 1: 3 groups, N = 42, F = 2.16 
Study 2: 3 groups, N= 81, F = 1.42 


Now, by using Equation 2 relating F and effect size we find: 


Л = J2(2.16)/42 = 32 and fy = 2(1.42)/81 =.187 


Thus, the expected effect size for investigator X’s study is (.32 + .187)/2 = .25. 
Now, the investigator wishes to know how many subjects per group are necessary 
to have power = .70 at œ = .05 and at о = .10. Referring to Table С.З and reading 
down the column under f= .25 until we reach a power value of at least .70, we see 
that 42 subjects per group will be needed at .05 level. Now, using Table C.6 for the 
.10 level, we see that 32 subjects per group are needed. 

In the above example the effect sizes were weighted evenly in determining the 
expected effect size. However, if one of the studies were much more similar to the 
study being conducted, then the investigator should weight that effect size more 
heavily, perhaps giving it double the weight. 
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3.6 WAYS OF IMPROVING POWER 


Given how poor power is generally with less than 20 subjects per group, the inves- 
tigator should consider the following four ways of improving power: 


1. Adopt a more lenient о level, perhaps ® = .10 or & = .15. 

2. Use one tailed tests where the literature supports a directional hypothesis. 

3. Consider ways of reducing within group variability, so that a more sensi- 
tive design results. One way is through sample selection; more homoge- 
neous subjects will tend to vary less on the dependent variable. For exam- 
ple, use just males, rather than males and females, or just use 6 and 7 year 
old children rather than using 6 through 9 year old children. Another way is 
through the use of factorial designs, which will be considered in Chapter 4. 
A third way to reduce within group variability is through the use of analysis 
of covariance, to be covered in Chapter 7. Covariates that have low correla- 
tions with each other are particularly helpful because each is removing a 
somewhat different part of the within group variance. A fourth way is 
through the use of repeated measures designs. These designs are very help- 
ful because individual differences due to the average response of subjects 
are removed from the error term, and such differences are the main reason 
for within group variability. 

4. Make sure there is a strong linkage between the treatments and the depend- 
ent variable, and that the treatments extend over a long enough period of 
time to produce a large or at least a fairly large effect size. 


It needs to be mentioned that how far one “pushes” the power issue depends on 
the consequences of making a type I error. One of the reviewers of this text noted 
that discussing power versus risk reduction (type I error) is most meaningfully 
considered within a given context and gave the following examples: “If I am test- 
ing two teaching methods which cost the same, I go for power. If one method is 10 
times more dollars, I go for risk reduction. If I’m comparing drugs A and B and one 
has some potent side effects, I go for risk reduction.” 

In the teaching methods example, if a type I error is made in concluding that the 
method that is 10 times more expensive is more effective, this will be a very costly 
mistake for a school district. If a type I error is made in the drug example in con- 
cluding drug A (with potent side effects) is better when it is not, this will have seri- 
ous health consequences for future subjects receiving drug A. 

The point the reviewer was making, which is well taken, is that using alpha = 
10 or .15 to improve power in some cases may not be a wise choice. 
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3.7 POWER ESTIMATION ON SPSS MANOVA 


Starting with Release 2.2, you can obtain power estimates for various statistical 
tests using the SPSS MANOVA program with the POWER subcommand. To quote 
from the SPSS User’s Guide (1988, 3rd edition), “Тһе POWER subcommand re- 
quests power valued based on fixed-effects assumptions for all univariate and 
multivariate Е and T tests" (p. 601). Power can be obtained for any о; level between 
0 and 1, with .05 being the default value. If we wish power at the .05 level for a one 
way ANOVA, we simply insert the following subcommand: POWER = F(.05)/, or 
if we wish to know the power at the .15 level: POWER - F(.15)/. We give two ex- 
amples to illustrate use of the POWER subcommand. The first is the ¢ test for inde- 
pendent samples example from Chapter 1 (Section 1.2), and the second is the one 
way ANOVA example from Chapter 2 (Section 2.9). The command lines for both 
and selected printout showing the power values are given in Table 3.2. 

The effect size measure in each case in Table 3.2 is partial eta-squared X. 
which is given by 


nj? = (df - F) dfn -F + df.) (5) 


where аў, denotes degrees of freedom for hypothesis and df. denotes degrees of 
freedom for error (Cohen, 1973). As the SPSS User's Guide (1988, p. 602) notes, 
"partial eta squared is an overestimate of the actual effect size. However, itis a con- 
sistent measure of effect size and is applicable to all F and f tests.” 

Cohen (1977, p. 281), in discussing power in one way ANOVA, notes the fol- 
lowing relationship between n? and the effect size f: 


n? = f (6) 


By squaring Equation 4 we find that f 2 = (k—1)F/N. Plugging this into Equa- 
tion 6, and with some algebraic simplification, we find that 


k—-1)-F 
in „ ё (7) 
(к-І)-Ғ-М 
Now, let us compare this with what partial 1|2 will be for а опе way ANOVA. For 
опе way ANOVA, dfn = (k—1) and df = (М—К). Plugging these into Equation 5 we 
obtain 


partial n? = (k — D- F /[(k —D- F +(N —k)] (8) 


Thus the only difference between n? and partial n? for one way ANOVA is N vs. 
(N—k) in the denominator. Since the denominator for Equation 8 will always be 
smaller than for Equation 7, partial will be an overestimate of effect size; however, 
for moderately large N (say > 50) the difference between the two is small. 


TABLE 3.2 


Power Analysis Runs on SPSS MANOVA for t test for Independent 
Samples and for a One Way ANOVA 


t test 

TITLE ‘T TEST FROM CHAP. 1’. 
DATA LIST FREE/TREAT PERFORM. 
BEGIN DATA. 
12151516161718 
19212122232324 
25272728 

END DATA. 

MANOVA PERFORM BY TREAT (1, 2)/ 
POWER = F (.05)/ 

PRINT = CELLINFO (MEANS) 
SIGNIF(EFSIZE)/. 


One Way ANOVA 


TITLE ‘ONE WAY ANOVA FROM 2.8’. 


DATA LIST FREE/GPID Y. 
BEGIN DATA. 
12131516 

2729211 
3434353833 
48444747 

END DATA. 

MANOVA Y BY GPID (1,4)/ 
(D POWER = Е (.05)/ 

PRINT = CELLINFO (MEANS) 
© SIGNIF (EFSIZE)/. 


For Position ONLY 


Pick up shaded area from p. 138 of previous edition. 


ФТЬ is the POWER subcommand to obtain estimated power at the .05 level. 
@This subcommand is needed to obtain the effect size measure partialn?. 
@Note that power is poor here. This was the example with large effect size but small sample size. 
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In discussing n? for one way ANOVA, Cohen (1977) characterizes n? = .01 as 
corresponding to a small effect size, n? = .06 to a medium effect size, and n? =. 14 
to a large effect size. 


3.8 SUMMARY 


1. Power is the probability of rejecting the null hypothesis when it is false. 

2. Power for a specific statistical test is dependent on (a) level of significance, 
(b) sample size, and (c) effect size. It is important to realize that power is heavily 
dependent on sample size. 

3. Small and medium effect sizes are very common in social science research. 

4. Cohen has provided the following rough, but useful, guidelines for small, 
medium, and large effect sizes for the / test and for one way analysis of variance: 


t test: d= = .20 (small), d= = .50 (medium), d > .80 (large) 
F test: f= = .10 (small), f= = .25 (medium), f > .40 (large) 


5. Post hoc estimation of power for completed studies is important in properly 
interpreting results. In particular, lack of significance in small sample studies may 
be simply due to inadequate power. 

6. The following relationships exist between the гапа F statistics and their cor- 
responding effect size measures: 


d=tfl/m+1/m, f -J(k—D-F/N 


Using these relationships, the corresponding effect size can be easily computed 
for any study in the literature and then Cohen's power tables entered to determine 
power. 

7. A priori determination of sample size required for a given power at some о. 
level requires an estimate of the anticipated effect size. This estimate can be ob- 
tained using estimated effect sizes from previous similar studies and/or from the- 
ory. 


EXERCISES 


1. Graphically, type І error is the area under the F distribution (for Ho true) in 
the critical region, while type П error is the area under the F distribution 
(for Ho false) not in the critical region. Below are given the F distributions 
for Ho true and for a case when it is not true for the situation of 4 groups and 
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30 error degrees of freedom. Also shown are the critical values for o = .10 
and .01. 


F (Hs true) 


F (H, false) 


2.28 4.51 


(a) Using different degrees of shading, indicate what areas correspond to 
alpha levels of .10 and .01. 

(b) Now, using lining, cross hatching, etc., indicate what areas corre- 
spond to type II errors for the above alpha levels. 

(c) As the alpha level decreases in (a), what happens to the sizes of the 
areas? 

(d) As the alpha level decreases, what happens to the sizes of the areas 
corresponding to type П error? Thus, what have we shown graphically? 


. Starting from the following form for the ¢ test for independent samples 


х= 0 


(сааи 2 Д 
D 


т +m = 2 т m 


show that d = t41/m +1/ по as given in Equation 2. 


. Explain why for the same alpha level a one tail test is more powerful than a 
two tail test. 


. Amarketing research study by Sternthal, Dholakia, and Leavitt (1978) was 
designed to test cognitive response theory predictions about the persuasive 
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efforts of source credibility and initial opinion on number of counterargu- 
ments generated. There were 37 subjects—17 who had a positive prior 
opinion and 20 who were initially negative. Each of these subjects was as- 
signed to either a moderate or high source credibility condition. As pre- 
dicted, the moderate credibility source subjects generated more counter- 
arguments; however, the f statistic was not significant (7 = 1.43, df = 35). 
(a) What is the effect size in this study? 

(b) Estimate power at the .05 level. 

(c) What might the investigators consider doing in a future replication 
study? 


. A researcher in counselor education reviews a small number of studies that 


have compared (a) counselors in training who participate in a classroom 
discussion on counseling skills and (b) counselors in training who both 
participate in a classroom discussion and also observe a videotape of expert 
counselors outside of the classroom. The dependent measure is empathy. 
He finds that only one of 10 such studies shows statistical significance at 
the .05 level. He thus concludes that the effectiveness of the videotape has 
not been established. Below are the sample sizes and associated t values for 
the studies: 


Classroom 
Discussion and Classroom 
Study Videotape Discussion t 
1 10 8 1.46 
2 12 12 1.76 
3 11 13 1.37 
4 25 20 1.23 
5 6 8 1.64 
6 49 36 3.08* 
7 20 20 –1.13 
8 21 23 1.59 
9 8 9 1.92 
10 10 10 -.45 


The results favor the classroom discussion апа videotape group in all cases 
except studies 7 and 10. 

(a) Calculate the effect size for each of the studies. 

(b) From an examination of the effect sizes, and considering power, 
might you come to a different conclusion concerning the effectiveness of 
the combined treatment? 


10. 


11. 
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. Show that f. = A(k — D-F/N as given in Equation 4. 


. An ANOVA is run with 5 groups and 25 subjects per group. The F value is 


2.03, which is not significant at the .05 level. What is power at the .05 
level? at the .10 level? 


. An investigator is in the process of planning a 4 group study in which she 


will use ANOVA to analyze the results. From previous related literature 
she estimates that the expected effect size for her study will be .35. How 
many subjects will she need per group for power = .70 ata. 2.05? Ata = 
.10? How many subjects would be needed per group at the same alpha lev- 
els if she wanted power to be .80? 


. A survey researcher compares four religious groups on their attitude to- 


ward education. The survey is sent out to 1200 subjects, of which 823 even- 
tually responded. Ten items, Likert scaled from 1 to 5, are used to assess at- 
titude. A higher positive score indicates a more positive attitude. There are 
only 800 usable responses. The Protestants are split into two groups for 
analysis purposes. The group sizes, along with the means and standard de- 
viations, are given below: 


Protestant 1 Catholic Jewish Protestant2 
nj 238 182 130 250 
x 32.0 33.1 34.0 31.0 
5i 7.09 7.62 7.80 7.49 


An analysis of variance on these four groups yields F = 311.66/55.58 = 
5.61, which is significant at the .001 level. 

(a) Estimate power at the .05 level for the above example. 

(b) Calculate the effect size, and discuss the practical significance issue. 


(a) Using SPSS MANOVA, obtain estimates of power for the t test exam- 
ple in Table 3.2 ata = .10, .15, and .20. 

(b) Does power become adequate for any of the above alpha levels? 

(c) Whatdoes the above suggest might be done in small sample studies? 


Why do I make the statement in the SUMMARY that “In particular, lack of 
significance in small sample studies may be simply due to inadequate 
power"? 
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Suppose we had a 2-group study with 10 subjects in Group 1 and 30 sub- 
jects in Group 2. Use Cohen’s power tables to determine what power would 
be for a medium effect size at the .05 level. 


Kazdin (2003, p. 71) notes, “Indeed, a review of medical research for a va- 
riety of diseases and conditions revealed that over 25% of the published 
studies (1975-1990) surveyed revealed no differences between the treat- 
ments that were studied. In the majority of these studies power was very 
weak.” Why is this problematic? 


Factorial Analysis of Variance 


CONTENTS 

4.1 Introduction 

4.2 Numerical Calculations for Two Way ANOVA 

4.3 Balanced and Unbalanced Designs 

4.4 Higher Order Designs 

4.5 А Comprehensive Computer Example Using Real Data 

4.6 Power Analysis 

4.7 Fixed and Random Factors 

4.8 Summary 

Appendix Doing a Balanced Two Way ANOVA With a Calculator 


4.1 INTRODUCTION 


In Chapter 2 we considered the effect of a single independent (grouping) variable 
on a dependent variable, called one way ANOVA. In this chapter we extend the dis- 
cussion to examine the effect of two or more independent variables (factors) on 
some dependent variable, which is called factorial analysis of variance. Often the 
interest is in whether the additional independent variable moderates or changes the 
effect of a primary treatment variable. For example, suppose an investigator be- 
lieves the effect of three treatments on changing attitude toward minorities will 
vary according to whether the subjects are male or female. That is, the investigator 
feels treatments will work differently with these subgroups. Since there are two 
levels for sex and three levels for treatment, we have what is called a 2 x 3 factorial 
design. As another example, suppose an educational psychologist has reason to be- 
lieve from previous research that teaching method 1 will yield highest achievement 
for urban elementary children while teaching method 3 will work best with rural 
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children. He is not sure which method works best with suburban children. He can 
check out these beliefs by setting up a 3 x 3 factorial design: three levels for loca- 
tion (urban, suburban, and rural) by three teaching methods. 

One broad area of research that utilizes factorial designs is called aptitude by 
treatment interaction (ATI) research. This research is concerned with the effect of 
any individual difference characteristic of subjects on their response to treatments. 
The definitive source on this type of research is Aptitudes and Instructional 
Methods by Cronbach and Snow (1977). Aptitude is defined very generally and in- 
cludes ability, personality, and nontest factors such as social class, ethnic back- 
ground, sex, etc. Cronbach and Snow discuss numerous ATI studies that have been 
done in various areas: (1) interactions of abilities with variations in instructional 
programming, (2) interactions in reading and arithmetic instruction, (3) interac- 
tions of abilities with variations in curriculum and instruction, and (4) interactive 
effects of making instruction less verbal. The reader with interest in any of the 
above areas will find the Cronbach and Snow book very interesting, critical, and 
informative. For applied researchers with a clinical orientation, there is an interest- 
ing review article by Dance and Neufeld (1988) on ATI research in the clinical set- 
ting. Here the focus is on client variables that predict differential treatment respon- 
siveness. They review the literature encompassing cognitive and/or behavioral 
treatments for anxiety, depression, pain, obesity, and tobacco dependence. 

The previous discussion has focused on one experimentally induced factor 
(treatments) and some individual difference characteristic of subjects that might 
moderate the effect of the treatments. Factorial ANOVA can be appropriate, how- 
ever, any time the subjects are cross-classified on two factors and measured on 
some dependent variable. For example, suppose a survey researcher cross-classi- 
fied 200 subjects on sex and religion (Catholic, Jewish, and Protestant) and wished 
to determine whether attitude toward abortion is influenced by sex and religion. 
She could test this out with a 2 x 3 (sex x religion) factorial design. 


Advantages of a Two Way Analysis of Variance 


A two way design enables us to examine the joint (interactive) effect of the inde- 
pendent variables on the dependent variable. We cannot get this information by 
running two separate one way analyses. An interaction means that the effect one 
independent variable has on the dependent variable is not the same for all levels of 
the other independent variable. This moderating effect can take two forms: 

(a) the degree of superiority changes, but one subgroup always does better than 
another to illustrate this, consider the following ability by teaching methods de- 
sign: 
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Methods of Teaching 


Ti Т; T3 
High Ability 85 80 76 
Low Ability 60 63 68 


The numbers in each cell represent the mean achievement for subjects in that 
cell; that is, 85 is average achievement for high ability subjects under teaching 
method 1, and so on. Note that the high ability students do better than the low abil- 
ity for all teaching methods (as we would expect). However, the superiority of the 
high ability students changes from 25 for T; to only 8 for T5. Since the order of su- 
periority is maintained, however, this is called an ordinal interaction. 

(b) The superiority reverses; that is, one treatment is best with one group, but 
another treatment is better for a different group. A study by Daniels and Stevens 
(1976) provides an illustration of this more dramatic type of interaction, called a 
disordinal interaction. Using a group of college undergraduates, they considered 
two types of instruction: (1) a traditional, teacher controlled (lecture type) and (2) a 
contract for grade plan. The subjects were classified as internally or externally 
controlled, using Rotter's scale. An internal orientation means that those subjects 
perceive positive events occur as a consequence of their actions (i.e., they are in 
control), while external subjects feel that positive and/or negative events occur 
more because of powerful others, or due to chance or fate. The design and the 
means for the subjects on an achievement posttest in psychology are given below: 


Instruction 
Contract for Grade Teacher Controlled 
Locus Internal 50.52 38.01 
of Control External 36.33 46.22 


The moderator variable in this case is locus of control, and it has a substantial 
effect on the efficacy of an instructional method. When the subject’s locus of con- 
trol is matched to the teaching method (internals with contract for grade and exter- 
nals with teacher controlled) the subject does quite well in terms of achievement; 
where there is a mismatch, achievement suffers. 

This study also illustrates how a one way design can lead to quite misleading re- 
sults. Suppose that Daniels and Stevens had just considered the two methods, ig- 
noring locus of control. The means for achievement for the contract for grade plan 
and for teacher controlled are 43.42 and 42.11, nowhere near significant. The con- 
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clusion would have been that teaching methods don’t make a difference. the facto- 
rial study showed, however, that methods definitely do make a difference, a quite 
positive difference if subject locus of control is matched to teaching methods, and 
an undesirable effect if there is a mismatch. 

A second advantage of factorial designs is that they can lead to more powerful 
tests by reducing error (within cell) variance. If performance on the dependent 
variable is related to the individual difference characteristic (the blocking vari- 
able), then the reduction can be substantial. Consider the hypothetical sex x treat- 
ment design below: 


Ті Т» 
Males} 18, 19, 21 17, 16, 16 
20, 22 (2.5) 18, 15 (1.3) 
Females T1, 12, 14 9.9, 11 
13, 14 (1.7) 8,7 (2.2) 


Notice that within each cell there is very little variability. The within cell vari- 
ances quantify this, and are given in parentheses. Recall from Chapter 3 that for 
equal group sizes the error term was just the average of the group variances. In two 
way ANOVA, for equal cell sizes, the error term is simply the average of the cell 
variances, which here is 1.925. on the other hand, if this had been considered as a 
two group design, combining males and females together, then the variability is 
considerably greater, as evidenced by within group (treatment) variances for Ті 
and Т; of 18.766 and 17.6, and a pooled error term for the ¢ test of 18.18. 

A third advantage of a factorial design is economy of subjects. We only need 
half as many subjects to do a two way ANOVA as would be needed for two one way 
ANOVAs with the same number of levels for each factor. We usea3 x 4 (A x B) 
factorial design with 10 subjects per cell to illustrate. Here we need a total of 
12(10) = 120 subjects to test whether a and B have a systematic effect on the de- 
pendent variable and to test for the joint effect of a and B (the interaction effect). To 
test whether a and B have systematic effects using two one way ANOVAs is going 
to require 40 Ss per level for a and 30 Ss per level for B, or a total of 240 subjects, as 
can be seen from the diagram below: 


B 
10 10 10 10 40 
A 10 10 10 10 40 
10 10 10 10 40 


FACTORIAL ANALYSIS OF VARIANCE 127 


Overview of the Five Major Sections in the Chapter 


Since this is a long chapter, we have split it up into five major sections to help the 
reader organize and see the different main thrusts. The first major section involves 
a numerical example, where we show how the sums of squares are calculated for 
the various effects in a two way ANOVA, and how to test each of the effects for sig- 
nificance. The second major section deals with equal and unequal cell size facto- 
rial analysis of variance. It is noted that although equal п is desirable, often in prac- 
tice unequal cell size occurs. Also, we discuss different ways of analyzing unequal 
n designs and indicate which method should generally be used. The third major 
section discusses higher order 3 and 4 way ANOVA designs. A three way ANOVA 
would arise, for example, if we wished to determine whether both sex and race 
moderated the effect of treatments. We would have a treatment by sex by race de- 
sign. The focus in this chapter for higher order designs is not on calculating sums 
of squares for the various effects, but rather on interpreting effects that are signifi- 
cant. The fourth major section involves a comprehensive computer example that 
ties together various concepts that were discussed earlier in the chapter. The final 
major section deals with power analysis for two and three way ANOVA, and the 
use of SPSS MANOVA for obtaining power estimates is illustrated. 


4.2 NUMERICAL CALCULATIONS 
FOR TWO WAY ANOVA* 


Now that the reader has examples of two way ANOVAs in mind and reasons why a 
two way ANOVA is advantageous, we consider a small data set to illustrate what 
the hypotheses are that we are testing and how they are tested. In the one way 
ANOVA there were just two sources of variation (between and within). In a two 
way ANOVA (A x B design) there are four sources of variation: 


Variation due to factor A. 

Variation due to factor B. 

Variation due to the interactive effect of A and B. 
Within cell (error) variation. 


Boos moe 


Consider the following 2 x 3 design with 3 observations per cell. 


*We again have the same assumptions as for a one way ANOVA, except now they apply to cells, that 
is, normality on the dependent variable in each cell and equal cell population variances. 
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Treatments (B) 


1 2 3 Row Means 
1 12, 16, 17 13,9, 8 14, 15, 13 х1:=13 
(males) X11: 15 X122 10 X132 14 
Sex (A) 2 6, 10, 8 11, 8,8 12, 10, 8 X2,29 
(females) x21 =8 X22 =9 x23 = 10 
Column Хау 11:5 х2=9.5 хз = 12 х = 11 grand 
Means menu 


The first number in the subscript for each cell mean refers to the level for Sex, 
while the second number refers to the level for Treatment. Thus, х12 refers to the 
mean for males in treatment 2, while x23 refers to the mean for females in treatment 
3. The dot notation in the second part of the subscript means we are summing 
across the columns or levels of factor B in obtaining the row means. The dot nota- 
tion in the first part of the subscript indicates we are summing across the rows or 
levels of factor a to obtain the column means. 

As for the one way ANOVA, we will use definitional formulas to compute the 
sums of squares for factor A, factor В, interaction and error, that is, 554, 55, SSAB 
and SS\,. Before we give these formulas, let us have clearly in mind what hypothe- 
ses are being tested. The first two involve what are called main effects for factors a 
and B. The null hypothesis being tested for the a main effect is: 


Ho : Wi, = U2. — ш. (population row means are equal) 


In the above table х = 13 = Ш. and x2, = 9 = №. are estimates of the population 
means for males and females, and the inferential question is, “Are the differences 
in the sample row means large enough, given sampling error, to suggest that the un- 
derlying population row means are different?" Also, Z in the above null hypothesis 
refers to the general number of levels for factor A. 

The null hypothesis for the B main effect is: 


Ho пл =И 2: = И у (population column means are equal) 


In the above data table х. = 11.5 2.1, x.2 = 9.5 = 02, and хз = 12 = Ñ; are esti- 
mates of the underlying population column means. Also, J refers to the general 
number of levels for factor B. 
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Thus, in general we are talking about an / x J design. We further restrict matters, 
at this point in the chapter, to what are called balanced designs, which are designs 
with an equal number of observations (л) per cell. 


Sums of Squares for Factor A and Factor B 
The definitional formula for sum of squares for factor А (554) is given by: 


554 == У (x, — x) 
=n] (m. — x? + 05. — х) + ---+ (а. – xy] (1) 


Note that nJ is the number of observations on which each row mean is based. 
Thus, this sum of squares merely reflects variability of the row means about the 
grand mean. It is analogous to the sum of squares between in the one way ANOVA. 
For our example we have 

SS4 = 3(3)[(13 – 11)2 +(9—11)2]= 72 
We want the mean sum of squares for factor A (М54), which is given by 
MS4 —S84,/(1—1)) —72/1— 72 
The definitional formula for the sum of squares for factor В (55) is given by 


$5в = nly (Xj —х)? 
= пі [(x1—xY. +(%2 — X) +--+ —Х)?] (2) 


This sum reflects variability of the column means about the grand mean. For 
our example it is 


558 = 3(2)[(11.5—11)? +(9.5—11)2 + (12 — 11)2] 
=21 


We want mean sum of squares for factor В (MSp), which is 
MSg = 85р КЈ —1) = 21/2 = 10.5 


Error Term 


To test each of these main effects for significance we need an error term. That error 
term is a pooled within cell measure of variability. Verbally, for each cell we devi- 
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ate the scores in the cell about the mean for the cell, square the deviations, and add 
them up across all the cells. Recall that exactly the same process was followed in 
obtaining the error term for the one way ANOVA, except that the scores were devi- 
ated about the group means. In symbols we can write the factorial error term as: 


DEREK (3) 


cells 


55, => (х— Xn 1G Xo! e Y Yo Xy» 
variability variability variability 
within cell 11 within cell 12 within cell /J 


Now we compute this for the example: 


55, = (12—15)? --(106 15? + (17 —15)2 (cell 11) 
+(13—10)2 +(9—10)2 +(8—10)2 (cell 12) 
+ (14 —14)2 +(15-14)2 +(13—14)2 (cell 13) 
+ + (12 –10)2 +(10—10)2 +(8—10)2 (сей 23) 
SS, = 52 


We want MS\,, which, as mentioned earlier, represents the average of the cell 
variances. This is given by 


MS, = SS, ТИМ — IJ) 


A degree of freedom is lost in estimating each cell mean, hence the degrees of 
freedom for error is N — LJ. For the example, we have 


MS,, = 52/(18 —6) = 4.33 


F Tests for the Main Effects 


The F tests for the main effects are analogous to doing two one way ANOVAs on 
the data, although the error term here is different. One can think of “slicing the data 
cake" first horizontally, and then vertically. The F ratio for the a main effect is 
given by 

F4 = MS, / MS,, 


which for our data becomes 


Fa = 7214.33 = 16.63 
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The critical value at the .05 level is given by 
F5;1-1,N- = Еоѕлл2 = 4.75 

Because the value of the test statistic is greater than the critical value (1.е., 
16.63 > 4.75), we reject and conclude that males did significantly better than fe- 
males. 

The F ratio for the B main effect is given by: 

Fg = MSz/MS,, , 
which for this data is 
Ер = 10.5/4.33 = 2.42 


The critical value at the .05 level is given by 


Fo5:7-1,N—-U = F05:2,12 = 3.89 


Since 2.42 < 3.89, we fail to reject and conclude that treatments did not have a 
differential effect on performance. Or, to put it another way, the sample column 
means are estimating equal population column means.* 


The Interaction Effect 


We will define the interaction sum of squares (5,548) in terms of the cell interaction 
effects. A cell interaction effect, which we denote by фу, is that part of the cell 
mean that can not be accounted for by overall effect (grand mean) and by main ef- 
fects for a and B. The main effects for a and B respectively are:ou, = u; — u and В; = 
= uU, where и is the grand (overall) population mean. Thus, the cell interaction ef- 
fect 1s: 
Фу = (Hj — M) - Qu. Ш + (Hj =p) 
= у = Шш. = шу TW 


Now, the sum of squares for interaction is 


SSap =пў 9} > 


*The reader needs to understand that although we are doing the tests on main effects first because of 
their simplicity, if a significant interaction effect is found then interpretation of the results needs to be 
focused on the interaction. An interaction effect means that the explanation of the data requires a more 
complex model. 
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where 


bi = хр— Хр — Xj +X (4) 
is the estimated cell interaction effect. 
Let us bring back the data again, but this time just with the row and column 
means and grand mean: 


(B) Treatments 


1 2 3 Row Means 
1 1 15 x12= 10 X137 14 13 
фи = 15 фо =—1.5 фз-0 
Sex (А) 2 X21 =8 x22 =9 x23 = 10 9 
$i =-1.5 do = 1.5 фз = 0 
Column Means 11.5 9.5 12 1 
2 
grand mean 


The estimated cell interaction effects above were obtained using Equation 4. To 
illustrate, we calculate the first two: 


би =15-13—11.5+11=1.5 
апа 


бо 210—13—95411——15 


It can be shown that for a fixed effects design the sum of the interaction effects 
for every row and column add to 0. Thus, for the above design, once the above two 
effects are calculated the others are determined. For example, since the sum of the 
interaction effects for row 1 must be 0, we have 


1.5+(-1.5)+x=0orx=0 


Similarly, since the sum of the interaction effects for column | must be 0, it fol- 


lows that $»; = -1.5. 
Plugging the interaction effects into the equation above: 


SSap = 3[(1.5)2 +(—1.5)2 +(—1.5)2 +(1.5)2] 
SSag = 3(9) = 27 
Now, the mean sum of squares (MSas) is given by 
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MS az = SSag [И — 1) —1) 


where (/ — /)(J — 1) is the degrees of freedom for interaction. For this problem we 
have 


М5лв --27/(2-1)3-іІ)- 13.5 


Тһе F ratio for interaction is 
Fag = MSap / MS,, = 13.5/4.33 = 3.12 
The critical value at the .05 level is given by 
Fos;-1)(J—-1),N-U = F05242 = 3.89 


Since 3.12 < 3.89, the interaction effect is not significant (the null hypothesis is 
Ho: All фу = 0). 

Another way of characterizing an interaction effect is as “а difference in the dif- 
ferences.” the differences across sex for the different treatments are respectively 7, 
1, and 4. Although these differences may appear to be large, the test statistic indi- 
cates that they are likely to occur from populations with equal differences. This is 
due to considerably sampling error present here because of the very small cell size 
of 3.* 

Graphically, if there is no interaction, then the population profile of means will 
be parallel. When plotting real data, however, the profiles of sample means will es- 
sentially always be non-parallel. For the previous example the interaction effect 
was not significant at the .05 level, and yet when the profiles of means are plotted 
(see Figure 4.1) for Ај and A» they are strikingly non-parallel. For this example 
there was a power problem due to small cell size (3), because the estimated interac- 
tion effect size is very large: f= JQ — 1)(3 — 1)2.12)/18 = 59. 

Still another example to illustrate that ће sample mean profiles will be non-par- 
allel even when the interaction F is less than 1 is provided in later in this section 
(where we consider three way ANOVA). The social class by grade level interaction 
F was .868. When the profiles of means were plotted (Figure 4.1) for social class 
they even cross, although just slightly. The F test is telling us, however, that these 
sample mean profiles are estimating parallel population profiles. 

Finally, note that the degrees of freedom for interaction, (1- 1)(7 — 1), is directly 
related to the fact that the sum of the interaction effects for every row and column 
must add to 0. Recall from our example, once we had computed the first two inter- 


*In the Appendix we illustrate how to do an equal n ANOVA using a calculator. 
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action effects, the others were determined. That is, they were not free to vary. This 
was because for the 2 x 3 design there are only 2 degrees of freedom. 


Linear Model for the Data 


Recall that the linear model for a subject's score (у) in one way ANOVA was 
уу = UA; + ey , 


where ц was the grand mean (general effect), оу = u – цу was an effect unique to the 
jth treatment, and еу was random error. The subjects score was decomposed into 
three components. In two way ANOVA we still have a linear model, although now 
there will be a component for factor A, a component for factor B, and a component 
to represent the joint effect of a and B. The model looks like this: 


ук= + +В) +0; “ей 
general main interaction error 
effect effects effect 


where 0; = u;, — u (deviation of ith row mean from grand mean), B; = 1L; - u (devia- 
tion of the jth column mean from the grand mean), and фу is the interaction effect, 
which was defined in the previous section. The triple subscript for the subject's 
score is read, "the score for subject k in cell ij." 

Now we show for a few selected subjects from the example in the section on In- 
teraction Effect how their scores can be decomposed into the four parts given by 
the linear model for a two way ANOVA design. We start with the first subject in 
cell 11. That score can be expressed as: 


Y =H +O, D; +y | eg 
general main interaction error 
effect ^ effects effect (5) 


Thus, the error component for this subject is —3. 
Or, consider the second subject in cell 21. That score of 8 can be decomposed as 
follows: 


8- 11 + (9-1) + (95-11) + 05) + eni 
general А main В main interaction error 
effect effect effect effect 


Therefore, the error component for this subject 15-1. 
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4.3 BALANCED AND UNBALANCED DESIGNS 


Balanced Designs 


In one way ANOVA the total sum of squares was partitioned into two independent 
sources of variation (sum of squares between and within). In a two way ANOVA, 
for equal cell size, the sums of squares for the main effects, interaction, and error 
are also independent. That 15, 554, 59, SSaz, and SSw are independent. Of course, 
the corresponding mean squares will also be independent. But the F ratios for A, B, 
and AB interaction are not independent. Why? Because they all share the same er- 
ror term, that is, MS,,. However, research has shown that if the total № is even mod- 
erately large, then the amount of dependence will be small and can be ignored for 
practical purposes. Thus, we will regard the F tests as independent. This is impor- 
tant in terms of clarity of interpretation, since significance on one effect is not de- 
pendent on (or confounded with) significance on the other effects. We see later on 
in this chapter that for disproportional cell sizes the effects are correlated or con- 
founded. 


Factorial ANOVA on SAS and SPSS 


Now we show how easy it is to run the data from the numerical example on SAS 
and SPSS. The control lines, along with annotation, are given in Table 4.1. Anno- 
tated printout from SAS is presented in Table 4.2 and printout for SPSS is given be- 
low: 


Placekeeper for shaded table from p. 158 of previous edition. 


Unbalanced Designs 


For the equal cell size designs the sums of squares for the different effects are 
uncorrelated, and for moderately large N the F ratios for the effects are essentially 
uncorrelated. This is important in interpreting results since significance on one ef- 
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TABLE 4.1 
SAS and SPSS Control Lines for 2 x 3 Factorial ANOVA 

SAS SPSS GLM 

TITLE ‘TWO WAY ANOVA’; TITLE ‘TWO WAY АМОМА—Р. 159”. 

DATA TWOWAY; DATA LIST FREE/FACA FACB DEP. 
Ф INPUT FACA FACB DEP ее; BEGIN DATA. 

LINES; 111211161117 
90111211161 1 17 1-2 23» 13.2.9) 1.72 8 

1 2 13 1 2. 9 12 8 1l 3 14 i 3 15 1.3 13 

1.3 14 I 3 15 1 3 13 2 l1. 6 2 Ll 10 2 1 8 

21 6211021 8 221122 822 8 

221122 822 8 2 3 12 2 3 10 2 3 В 

2 3 12-2 3 10 2 3 8 END DATA. 

PROC PRINT; LIST. 

PROC GLM; © UNIANOVA DEP BY FACA 
@CLASS FACA FACB; FACB/DESIGN/. 
@®MEANS FACA FACB FACA*FACB; 


© МОРЕ DEP = FACA FACB 
FACA*FACB; 


© Recall that the @ @ is necessary in order to put the data for more than one subject on the same 
data line. 

@ the first two numbers of each triple here are for the cell ID for the subject. Thus, the first triple 
here indicates the first subject in cell 11 has a score of 12 on the dependent variable, the second triple 
that the score for the next subject in cell 11 is 16, and the first triple for the fourth data line that the score 
for the first subject in cell 21 is 6. 

G This CLASS statement indicates which of the variables in the INPUT statement are the grouping 
variables. 

@ This MEANS statement is required to obtain the level (row and column) means and the cell 
means. 

© in the MODEL statement we put the dependent variable(s) on the left side and the effects in the 
design on the right side. Since we wish to test the full factorial model we put in the main effects and the 
interaction. Observe that the interaction effect is indicated by placing an * between the factors. 

(6 We are using the GLM (general linear model) program, which does both regression analysis and 
ANOVA for one dependent variable. That is why the code name is UNIANOVA, for ANOVA on one de- 
pendent variable. 


fect implies nothing about significance on another. This makes for a clean and 
clear interpretation of results. However, often in real world data situations we will 
not have equal cell size, for at least two reasons: 


1. Even if we started with equal cell size in an experimental study, because of 
experimental mortality (subjects dropping out of the study for various reasons— 
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parents moving, boredom, annoyance with the treatment, treatment schedule be- 
comes inconvenient, etc.) we wind up with unequal cell sizes. 

2. We are studying intact groups, which when cross-classified produce quite 
different subgroup (cell) sizes. of course, we could in some instances simply ran- 
domly discard subjects from cells to achieve equal n, but in other cases this may 
cause a loss of too many subjects. 


Thus it becomes imperative to be able to analyze and properly interpret unequal 
cell size factorial designs. The problem with disproportional cell size designs is 
that the effects become correlated (confounded), and unless these correlations are 
taken into account we may misinterpret the results. There is a considerable amount 
of literature on the topic, particularly from the late 1960s through the 1970s. Over- 
all and Spiegel (1969), in a classic paper on analyzing factorial designs, discuss 
three basic methods of analysis: 


Method 1: Adjust each effect for all other effects in the design to obtain its 
unique contribution (regression approach). 

Method 2: Estimate the main effects ignoring the interaction, but estimate the 
interaction effect adjusting for the main effects (experimental method). 
Method 3: Based on theory and/or previous research, establish an ordering for 
the effects, and then adjust each effect only for those effects preceding it in the 
ordering (hierarchical approach). 


For equal cell size designs all three of the above methods yield the same results, 
that is, the same F tests. Therefore, it will not make any difference, in terms of the 
conclusions a researcher draws, which of these methods is used on one of the pack- 
ages. For unequal cell sizes, however, these methods can yield quite different re- 
sults. 


Two Examples for Unbalanced Designs 


We give two examples for unequal n factorial designs. The first example uses arti- 
ficial data, but shows that the method of analysis can affect the conclusions drawn. 
The second example uses real data, and also illustrates bringing data in from disk. 
For our first example, consider the following 2 x 3 design: 


B 


A 3,5,6 2,4,8 11,7,8,6,9 
9,14,5,11| 6,7,7,8,10,5,6| 9, 8, 10 
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The control lines for running the analysis on SPSS and on SAS are the same as 
for the equal се n case. With both programs the regression approach (Method 1) is 
the default option; that is, it is used automatically unless something else is speci- 
fied. In both programs the unique sum of squares (regression approach) is called 
type Ш sum of squares. Type I sum of squares in both programs refers to the se- 
quential sum of squares. In the sequential approach (also called hierarchical) a 
given effect is adjusted for all effects to its left (or preceding it) in the ordering. 
Suppose the effects went in the following order: FACA, FACB, FACA * FACB. 

Then, in the sequential approach, the a main effect is not adjusted for anything. 
The B main effect is adjusted for the a main effect, and the interaction is adjusted 
for both main effects. In this approach the sums of squares for the terms in the 
model do add up to the total sum of squares. 

We ran the above data on SAS GLM and on the SPSS GLM program (SPSS 
Base, р. 239). Both sums of squares come out in one run on SAS; recall that type Ш 
sum of squares is the unique sum of squares, while type I is the sequential sum of 
squares. Two runs are required for SPSS, although we just present the type Ш 
(unique) sum of squares for the SPSS run in Table 4.3. 

If we use the unique sum of squares approach we would conclude that only the 
factor a main effect is significant at the .05 level of significance, because only that 
p value is less than .05. on the other hand, with the sequential sum of squares the 
conclusion would be that both main effects are significant at the .05 level (p values 
of .048 and .043, respectively). Thus, the method used with disproportional de- 
signs can make a difference in terms of the conclusions drawn from an experiment. 
Importantly, however, the interaction F is the same for both approaches, because 
all other effects (main effects) are partialed from it with the unique sum of squares 
approach, and also both main effects are partialed in the sequential approach since 
the interaction effect is last in the ordering. 

Our research example involves data from a study by Philips and Jahanshahi of 
the London University Institute of Psychiatry (Hand & Taylor, 1987). The study 
examined the effectiveness of different kinds of psychological treatment on the 
sensitivity of headache sufferers to noise. Each subject was first pretested on sensi- 
tivity, then given relaxation training (to be defined shortly), then given one of 4 
treatments, and finally posttested on sensitivity. The sensitivity scores were ob- 
tained by listening to a tone that gradually increased in volume and having the sub- 
jects rate the levels at which the tone became (1) uncomfortable and (2) definitely 
unpleasant. These ratings are the dependent variables for the study. We denote the 
pretest and posttest ratings by PREU, PREUP, POSTU, and POSTUP. 


1. The subjects were asked to listen to the tone at their definitely unpleasant 
level for up to two minutes, with the option of terminating the exposure if 
they wished. 
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2. The subjects were then given instruction on breathing techniques and the 
use of visual imagery to act as a controlled distraction. 


The design was a 2 x 4 factorial because there were two types of headache suf- 
ferers involved (migraine and tension) and four treatments, which were as follows: 


Ті: Subjects іп this group listened to the tone again at their initial definitely un- 
pleasant (POSTUP) level for the length of time that they were able to stand it in 
the relaxation training phase. 

T»: This treatment was the same as 7), but with one extra minute's exposure to 
the tone. 

T3: The subjects in this treatment group had the same exposure to the tone as 
those in treatment group 2, but they were instructed to use the relaxation tech- 
niques of breathing and visual imagery. 

T4: This was а control group, in that the subjects had no exposure to the tone 
between the relaxation training and the posttest measurement. 


From within migraine and tension, the subjects were randomly assigned to the 
treatment groups. However, missing data reduced an initial balanced design to the 
following unequal n situation: 


Ti Т; Т; ТА 
Migraine 11 11 12 11 
Tension 14 11 16 12 


Here we consider analysis on only the definitely unpleasant posttest rating 
(DEFUNPL) of the subjects. The raw data for this study is given in Appendix a in 
the back of the book. I had the data on a 3.5-inch disk, along with several other data 
sets, and ran the analysis using the GLM program of SPSS for Windows 12.0. In 
Table 4.4 I present selected printout from that run. 

For those who are interested, Stevens (1996, pp. 294-301) shows through 
dummy coding of the effects that the effects are indeed uncorrelated for balanced 
designs and correlated for disproportional designs. 


Which Method Should Be Used? 


After much debate in the statistical literature in the 1970s, there seem now to be a 
consensus that Method 1 (obtaining the unique sum of squares for each effect) gen- 
erally should be used. For example, this is what Carlson and Timm (1974) recom- 
mend, and what Myers (1979) recommends for experimental studies (random as- 
signment involved), or, as he puts it, “whenever variations in cell frequencies can 
reasonably be assumed due to chance” (р. 403). 
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TABLE 4.4 
Selected Printout From SPSS GLM for Headache Data 


Placeholder for art p. 165 of previous edition 


When an a priori ordering of the effects can be established, then Method 3 (hier- 
archical or sequential sum of squares) makes sense. Pedhazur (1982) gives the fol- 
lowing example. There is a 2 x 2 design in which one of the classification variables 
is race (black or white) and the other classification variable is education (high 
school or college). The dependent variable is income. In this case one can argue 
that race affects one's level of education, but obviously not vice versa. Thus, it 
makes sense to enter race first to determine its effect on income, then to enter edu- 
cation to determine how much it adds in predicting income. Finally, the race x edu- 
cation interaction is entered. 


4.4 HIGHER ORDER DESIGNS 


Three Way Analysis of Variance 


Here we are examining the effect of three independent variables or factors on some 
dependent variable. We present three examples to illustrate: 


1. An instructional technologist wishes to determine whether teaching 
method (2), teacher (2), and sex of the child each have an effect on achievement in 
reading. She has a 2 x 2 x 2 factorial design. This design enables her to determine 
whether all 3 factors jointly affect achievement in some unique way. For example, 
perhaps method 1 is particularity effective with teacher 1 working with girls, while 
method 2 is not effective with teacher 2 working with boys. 

2. Consider again the aptitude treatment interaction study by Daniels and 
Stevens (1976) mentioned earlier. That study examined the effect of locus of con- 
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trol and teaching method on achievement in an introductory psychology course, 
and found a disordinal interaction. Internals did better with the contract for grade 
plan while externals did better with the teacher controlled method of instruction. 
As a heuristic followup to their study, Daniels and Stevens broke the subjects down 
into males and females and ran a sex by locus of control by method ANOVA to de- 
termine whether the nature of the interaction might be different for males and fe- 
males (it was not). 

3. А study by Marwit and Neumann (1974) provides another illustration of a 
three way ANOVA. Two black and two white examiners administered standard 
and nonstandard English forms of the California Reading Test to 60 black and 53 
white second graders from a St. Louis public school. Here the race of the subject is 
one factor, the race of the examiner the second factor, and the format the third fac- 
tor, while achievement is the dependent variable. The design schematically is this: 


Format 
Standard English Nonstandard English 
Subject Examiner 
Black 
Black White 
Black 
White White 


Recall that in a one way ANOVA there were two sources of variation (between 
and within), in a two way ANOVA there were four sources of variation (factor A, 
factor B, interaction of a and B, and within cell or error variation), and three hy- 
potheses that were tested: A and B main effects and interaction effect. How many 
sources of variation are there in a three way design? the number of sources of varia- 
tion in general for а К way factorial design is 25, Thus, for a 3 way design there are 
23 = 8 sources of variation, while for a 4 way ANOVA there are 24 = 16 sources of 
variation. Consider the methods x teacher x sex design again. The sources of varia- 
tion are 


METHOD (A) 
TEACHER (B) |6 MAIN EFFECTS 
SEX (C) 

METHOD x TEACHER 

METHOD x SEX } FIRST ORDER INTERACTIONS 
TEACHER x SEX 

METHOD x TEACHER x SEX 

WITHIN CELLS (ERROR) 
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Each of the 7 effects in the design is tested against the same error term, that is, 
within cells variability (MS,,). Thus, the F ratios would look like this: FA = 
MSA/MS,, Ев = MS?/MS,,, 2. FBC = М5во/М%,,, Ғавс = М8арс/М85,,. Тһе process 
of computing SS\, is exactly the same as for the two way ANOVA, that is, deviate 
the scores about the means in each cell, square the deviations and then add the 
squared deviations. The degrees of freedom for error for the two way ANOVA was 
N — IJ (total number of subjects — number of cells). If we denote the number of lev- 
els for the factors in a 3 way ANOVA by J, J, and K, then the degrees of freedom for 
error in a3 way is М—1ЈК (again, the total number of subjects – number of cells). 

The main effects involve comparing level means, analogous to comparing row 
and column means for the two way ANOVA. The first order (or two way) interac- 
tions are assessed by examining the pattern of means for the two factors combined 
over the third factor. For example, the method x teacher interaction is assessed by 
examining the means for those two factors with boys and girls combined together. 
Finally, the three way interaction is going to tell us whether the patterns of means 
for any two factors differs across the levels of the third factor. 


Interpretation of Effects for An Example from Literature 


To make this more concrete we consider data from a study by Cradler and 
Goodwin (1971). They were interested in comparing the way in which three types 
of reinforcement affected children's ability to use the word they when making up 
sentences. The children were randomly assigned to three groups: (1) material rein- 
forcement condition—subjects received an M&M candy immediately after using 
the word they at the beginning of a sentence; (2) praise reinforcement—the chil- 
dren were reinforced by the experimenter's saying "good"; and (3) symbolic rein- 
forcement—the children were simply given a plus mark. The investigators were 
also interested in whether the reinforcements worked differently for middle and 
lower class children (second factor in the design), and for different aged children 
(2nd and 6th graders—the third factor in the design). Below are the means (M) and 
standard deviations (SD): 


Grade Level 


Social Grade 2 Grade 6 
Class Mat. Praise Symb. Mat. Praise Symb. 
Middle 
M 5.66 6.64 6.58 5:75 8.25 9.66 
SD 2.32 3.28 2.98 1.63 4.12 3.37 
Lower 
M 8.41 5.41 5:29 6.75 7.00 6.33 
SD 4.23 3.63 2.74 4.20 4.16 3.22 


There were 12 subjects in each of the cells. The following ANOVA table was 
obtained: 


FACTORIAL ANALYSIS OF VARIANCE 147 


Source df MS Е 
Social Class (А) 1 12.250 957 
Grade Level (В) 1 25.000 1.954 
Type of 2 1.465 114 
Reinforce (C) 

AxB 1 11.111 868 
АхС 2 41.646 3.255* 
BG 2 47.396 3.704% 
AxBxC 2 5.298 414 
Error 132 12.792 


Note that none of the main effects is significant, although grade level is closest. 
Recall that in three way ANOVA main effects test whether the underlying popula- 
tion level means are different for the factor under consideration. The level mean for 
grade 2 is obtained by adding up all the means for grade 2 and dividing by 6: 


(5.66 + 6.64 4- 6.58 + 8.41+5.41+5.25)/6 = 6.325 


and similarly for grade 6. 
The reinforcement level means are obtained by adding the 4 means for each re- 
inforcement condition and then dividing by 4. Thus, for material, we have 


(5.66 + 8.41+5.75 + 6.75)/4 = 6.6425 


The social class level means are obtained by adding up the 6 means for each so- 
cial class across grades and reinforcement conditions. Thus, for middle social class 
we have: 


(5.66 + 6.64 + 6.58 + 5.75 + 8.25 + 9.66) /6 = 7.09 


All the sample level means are given below: 


Grade Level Means: Grade 2: 6.325, Grade 6: 7.29 
Social Class Means: Middle: 7.09, Lower: 6.525 
Reinforcement Level Means: Mat.: 6.643, Praise: 6.825, Sym: 6.955 


The reinforcement means are very close, making it quite likely that they are es- 
timating equal population values, which is reflected in the very small F =. 111. 

Now, let us turn to the 2 way interaction effects that were significant, that is, AC 
(Social class x reinforcement) and BC (grade x reinforcement). To interpret these 
we need the means for social class by reinforcement combined over grade and the 
means for grade by reinforcement combined over social class. These means are 
presented below: 
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Reinforcement Reinforcement 
Mat. Praise Symb. Mat. Praise Symb. 
Middle 5.71 7.45 8.12 Grade 2 7.04 6.03 5.92 
Lower 7.58 6.21 5.79 Grade 6 6.25 7.63 8.00 


The mean of 5.71 for the middle class and material reinforcement cell is ob- 
tained by adding the means for this set of conditions for the two grades, that is, 
(5.66 + 5.75)/2 = 5.71, and similarly for the other means. The mean of 7.04 for the 
grade 2 by material reinforcement condition is obtained by adding the means for 
this set of conditions for the two social classes, 1.е., (5.66 + 8.41)/2 = 7.04 and simi- 
larly for the other means. 

The interaction for social class x reinforcement is disordinal. That is, the lower 
class children respond better to the material reinforcement and then the means 
“flip flop”; the middle class children respond better to the praise and symbolic re- 
inforcement. 

The grade by reinforcement interaction is also disordinal; that is, the younger 
children respond better to the material reinforcement, and then the means “flip 
Пор”; the older children respond better to praise and symbolic reinforcement. 


The Three Way Interaction Effect 


Why was the three way interaction not significant? As was mentioned earlier, a 
significant three way interaction implies that the two way interaction profiles are 
different for different levels of the third factor. If the patterns of means (profiles) are 
similar, then no interaction will be found. We present the means again below: 


Grade 2 Grade 6 
Mat. Praise Symb. Mat. Praise Symb. 
Middle 5.66 6.64 6.58 5.75 8.25 9.66 
Lower 8.41 5.41 5.25 6.75 7.00 6.33 


Note that the profile of means for second graders is very similar to that for sixth 
graders. In both cases the mean is higher for lower social class under material rein- 
forcement, and then reverses and is higher for praise and symbolic reinforcement 
for both grade levels. That is, for both the second and sixth grade we have the same 
type disordinal interaction. 

Now we consider two hypothetical situations in which a significant three way 
interaction is present, and illustrate these graphically. The first example is a sex x 
treatment by race design while the second example involves a counseling methods 
x counselors x sex design. Suppose that the means for the two way design (col- 
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lapsed on race) and for counseling methods x counselors (collapsed on sex) were 
as follows: 


Example 1 Example 2 
Ti T2 Ci C2 
Males 60 50 Method 1 85 72.5 
Females 40 42 Method 2 75 76 


Example | shows а clear ordinal interaction while example 2 shows а disordinal 
interaction, but neither of these tells the whole story. We now present the two way 
profiles of means for whites and blacks for example | and for males and females 
for example 2: 


Whites Blacks Males Females 
Ti T» Т1 Т; Ci Сэ Ci О 
Male 65 50 55 50 Method 1 80 70 90 75 
Female 40 47 40 37 Method2 70 78 80 74 


For example | we can see that the profiles of means for whites and blacks are 
distinctly different. Race further moderates the sex by treatment interaction. For 
whites we have a strong ordinal interaction while for blacks there is no interaction 
effect. For the counseling example (example 2), we see that the disordinal interac- 
tion effect apparent when males and females were combined together was due to 
the males. We see a clear disordinal method by counselor interaction for males, 
while for females there is an ordinal interaction. Here, sex moderates the method 
by counselor interaction. The practical implications for this example are that coun- 
selor 1 does uniformly better with method 1 regardless of sex. Method 2 is optimal 
for counselor 2 working with males, but for females counselor 2 is equally effec- 
tive with both methods. We display graphically the interaction profiles for these 
two examples in Figure 4.2. It is important to emphasize again that a significant 
three way interaction means that the two way interaction profiles for any two of the 
factors are different for the levels of the third factor. Thus, in the sex by treatment 
by race example above, a significant three way interaction means that: 


1. The sex by treatment profiles are different for the races (which is what we 
illustrated). 

2. The sex by race profiles are different for the treatments. 

3. The treatment by race profiles are different for the two sexes. 


In the context of aptitude-treatment interaction (ATI) research, Cronbach 
(1975) had an interesting way of characterizing higher order interactions: 
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When ATI’s are present, a general statement about a treatment effect is misleading 
because the effect will come ог go depending on the kind of person treated. ... An ATI 
result can be taken as a general conclusion only if it is not in turn moderated by fur- 
ther variables. If Aptitude x Treatment x Sex interact, for example, then the Aptitude 
x Treatment effect does not tell the story. Once we attend to interactions, we enter a 
hall of mirrors that extends to infinity, (p. 119) 


Interpreting Patterns of Significant Effects 


To further continue our discussion of interpreting effects from a three way 
ANOVA, we consider an a (methods) x B (teacher) x C (sex) example. We examine 
three possible patterns of significant results, and how one would interpret those 
patterns. 


Pattern 1 
A*(method) 
B(teacher) 
C(sex) 
AB 
AC 
BC 
ABC 
*p « .05 


Here only the method main effect is significant. Since there are no significant 
interactions we needn't qualify our statement on the efficacy of methods. It can be 
stated that one method produces uniformly higher achievement than the other re- 
gardless of teacher and regardless of sex of the child. A pattern of means that 
would be congruent with the above results is 


Ti Т; 
М, 72 70 
М; 63 62 
Райет 2 
A*(method) 
B(teacher) 
C(sex) 


AB* 


Sex by Treatment Profiles for Each Race 


Whites Blacks 


Males 
60 60 
Oe Мә . 


| —— uam 00 


. 20 


e—a 
Females 
0 н. l Treatments 
Counseling by Counselor Profiles for Each Sex 
Males Females 
90 + 90 + 
Method 1 
80 - 80 - 
Method 2 
60 -  Method2 Method1 60 - 
40 40 - 
L 
20 20 
0 0 Counselor 
С, С; C, Co · 


FIGURE 4.2 Two Way Interaction Profiles for Sex by Treatment by Race Design and for the 
Counseling by Counselor by Sex Design 
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AC 

BC 
ABC 

яр < .05 


Here we have a method main effect again, but this time there is also a significant 
method by teacher interaction. Thus the efficacy of method needs to be qualified. 
The interaction is telling us that how much better one method is than another de- 
pends on the teacher. A pattern of means congruent for the above would be 


Ti Т; 
М, 70 65 67.5 
M2 60 62 61.0 
65 63.5 


Method 1 is superior to method 2; however, the degree of superiority depends 
sharply on the teacher. For teacher | method | is vastly superior, while for teacher 
2 method | is only slightly better than method 2. 


Pattern 3 
A(method) 
B*(teacher) 
C(sex) 
AB 
AC 
BC 
ABC* 
*p < .05 


The teacher main effect here needs to be considerably qualified because of the 
significant three way interaction. This could be discussed in terms of differences 
between two way profiles, as was done previously. Or we might think of it as fol- 
lows. Call the two teachers Ms. Jones and Mr. Morton. The main effect is telling us 
that one teacher tends to get higher achievement regardless of method and sex of 
child (suppose this is Ms. Jones). The three way interaction is telling us that how 
much better achievement Ms. Jones obtains depends on both the method being 
taught and on the sex of the child. For example, Ms. Jones might do much better 
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than Mr. Morton working with method 1 and girls, while she gets only slightly 
better achievement working with method 2 and boys. 


Three Way ANOVA on SAS and SPSS 


To illustrate the setup of the control lines for running a three way ANOVA on SAS 
we consider the following sex x age x treatment data set: 


Treatments 
Age 1 2 3 

14 1 4,6,9 2 2,3,8 11,9,16 

Males (111) (112) (113) 
17 9,11,8 11,7,8 10,14,9 

(121) (122) (123) 

14 10,2,8 12,7,15 3,7,4 

Females (211) (212) (213) 
17 7,6,12 9,11,7 10,15,8 

(221) (222) (223) 


The numbers in parentheses are the cell identifications, and are very important 
in identifying to the packages where the data originate. The 111 cell ID means the 
first level for each factor, while 113 means the first level for factors 1 and 2 and the 
third level for factor 3, and 213 means the subject is in the second level for factor 1 
(female), the first level for factor 2 (age 14) and the third level for factor 3 (treat- 
ment 3). Once the cell identification is clear the rest of the setup is relatively 
straightforward (see Table 4.5). 

Selected printout from SPSS GLM for Windows 12.0 for this data is given in 
Table 4.6. The SPSS options screen (used for obtaining marginal means) is given 
in Table 4.7. 


Calculation of Sums of Squares in Three Way ANOVA 


We illustrate here, using definitional type formulas, how some of the sums of 
squares given in Table 4.6 are obtained, and leave the calculation of the others as 
exercises. In doing this we link the process to what was done for two way ANOVA, 
since it is similar. Recall that earlier in calculating the sum of squares for the main 
effects for a and B the definitional formulas were 


SS, =nJ (x —x? and 55в =nl (ху—Х)? 
That is, the row and column means were deviated about the grand mean, and the 
weighting factor in each case is the number of observations on which each row 


TABLE 4.5 
SAS Control Lines for Sex x Age(2) x Treat (3) ANOVA 


TITLE ‘THREE WAY ANOVA’; 
DATA THREEWAY ; 
а) INPUT SEX AGE TREAT Y Qe; 
ES; 
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10 
PROC PRINT; 
PROC GLM; 

$ CLASS SEX AGE TREAT; 
MEANS SEX AGE TREAT SEX*AGE 

EX*TREAT AGE*TREAT 
SEX*AGE*TREAT; 
ODEL Y - SEX|AGE|TREAT; 


Ф In the INPUT statement we list the variables in the analysis. 

(2) The first 3 numbers for each block of 4 numbers is the cell ID, with the fourth number being the 
score on the dependent variable. Thus, the first subject in cell 111 has a score of 4, the second subject in 
cell 111 has a score of 6 and the third subject a score of 9. Although not necessary, we have put the data 
for each cell on a separate line for ease of reading. 

® This CLASS statement lists the grouping variables (factors) for the ANOVA. 

@ The MEANS statement is needed to obtain the level means (SEX AGE TREAT), the means for 
the two way interactions (SEX*AGE SEX*TREAT AGE*TREAT) and the cell means. 

© This is the abbreviated way of representing a full three way factorial model іп SAS. 
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TABLE 4.6 
SPSS GLM Printout for Three Way ANOVA: Tests of Significance 
and Marginal Means for All Effects 


Insert art p. 176 of previous edition 
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TABLE 4.6 (Continued) 


Insert art p. 177 of previous edition 
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TABLE 4.6 (Continued) 


Insert art p. 178 of previous edition 


mean (nJ) or column mean (nl) is based. In calculating the sums of squares for the 
main effects in three way ANOVA we simply deviate the /evel means about the 
grand mean, and the weighting factor in each case is the number of observations on 
which each level mean is based. Now the grand mean for the ANOVA in Table 4.6 
is the average of the level means for sex and is thus 8.5555. Since each sex level 
mean is based on 18 observations, the sum of squares for the sex main effect is: 


SSsex = 18[(8.6111—8.5555)2 + (8.5000 — 8.5555)? ] 
=.11129 


The discrepancy from the value in Table 4.6 is due to rounding error. Similarly, 
the sum of squares for treatment is given by 


SS = 12[(7.6666 — 8.5555)? + (8.3333 — 8.5555)? + (9.6666 = 8.5555) ] 
= 24.88864 
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The calculations for a two way interaction effect are exactly the same as for the 
two way ANOVA (see Table 4.6), after one has collapsed on the levels of the third 
factor. To illustrate we consider the calculation of sum of squares for the sex by 
treatment interaction. The means for sex by treatment combined over the two age 
groups, along with the row and column means and the interaction effects (in paren- 
theses) are: 


Treatments 
1 2 3 
1 7.8333 6.5000 11.5000 8.6111 
(1111) (-1.89) (1.78) 
Sex 
2 7.5000 10.1667 7.8333 8.5000 


(—1111) (1.89) (-1.78) 
7.6667 8.3333 9.6666 8.5555 


Recall that the formula for sum of squares interaction is 


55 = пуф? 


where п is the number of observations in each cell апа фу is the estimated interac- 
tion effect for the ijth cell, and 


Qj = хр—ж—Х]+Х 


Thus, the sum of squares here is 


SS = 6[(.1111)2 + (—1.89)2 + (1.78)? + (—.1111)? + (1.89 + (— 1.78)? ] 
= 80.89 


The calculations for interaction sum of squares for sex by age and age by treat- 
ment are similar, and are left as exercises. 

To calculate sum of squares for the three way interaction effect we first compute 
variability of the cell means about the grand mean (denote this by SScenasc)). The 
means are: 


Males Females 
1 2 3 1 2 3 
14 6.3333 4.3333 12.0 6.6667 11.3333 4.6667 


Age 
17 9.3333 8.6667 11.0 8.3333 9.0000 11.0000 
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TABLE 4.7 
SPSS 12.0 GLM Options Screen for Obtaining 
Marginal Means and Interaction Means 


Insert art from p. 179 of previous edition 


SScet(ABC) = 3[(6.3333 — 8.5555)? + (4.3333 — 8.5555)? + (12 — 8.5555)? 
+----+(9—8.5555)2 + (01 —8.5555)? ] = 221.556 


From this quantity we subtract all variation due to the main effects and the first 
order interactions. What remains is variability due to the three way interaction ef- 
fect; 


$$Авс = SSce(aBcy — 984 — Зов — SSc — SSAp — SSAc — SSpc 
SSApc = 221.556 —.1111—36 —.1111— 24.8888 — 80.8888 — 4.6667 
— 14.889 
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The error term for equal cell size, as was true for two way ANOVA, is simply 
the average of the cell variances. 


MSy = (6.3333 + 10.3333 +13 4---- +10.3333 +4 +13)/12 
= 9.0555 


Four Way Analysis of Variance 


In a four way analysis of variance we are examining the effect of 4 independent 
variables on some dependent variable. We consider an example from the literature 
to illustrate. Chase (1986) examined the effect of penmanship quality, sex, race, 
and reader expectation on the grade given an essay test. The graders were 80 ele- 
mentary and middle school inservice teachers in an integrated large urban area of 
the Midwest. Each grader was given contrived student school records, some of 
which contained mainly As and Bs while others contained mostly Ds and Us. 
These records were intended to create in the essay reader a set as to the level of 
achievement expected from the students whose paper was being graded. Thus 
there were two levels for reader expectation. The essays were written in poor and 
good quality of penmanship, as judged by the Ayres handwriting scale. Thus, a2 x 
2 x 2 x 2 four way ANOVA was run. As mentioned earlier, in a 4 way design there 
are 16 sources of variation and 15 hypotheses that are tested. For the Chase exam- 
ple the 15 effects are: 


Penmanship (A) 


dn Main Effects 
Expectation (D) 

АхВ 

АХС 

AxD First Order Interactions 
BxC 

BxD 

Схр 

AxBxC 

t à ы : E Second Order Interactions 
BxCxD 

AxBxCxD 


One does not see very many 4 or 5 way ANOVAs in the literature. A couple of 
reasons for this are (1) the difficulty of interpreting higher order interactions and 
(2) sample size required so that some of cell frequencies are not extremely small 
(like 1 or 2 subjects). 

We wish to discuss a caution in using such designs for another reason. While the 
use of complex ANOVA designs is the only way to get at higher order interactions, 
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and their “real” existence may have important practical implications, the key word 
here is real. Remember in a 4 way ANOVA we are testing 15 hypotheses. To some 
researchers this may seem like a boon, but it can be a bane if one is not careful. Re- 
searchers using such designs often interpret any effects that are significant at the 
.05 level. The potential danger with this for 3, 4, or 5 way ANOVAs is that the over- 
all о level gets out of control. Recall again from Chapter 2 that if we аге testing k 
hypotheses, each at the .05 level, then an upper bound on overall о is given by 1- 
(1-.05)< Below we list the upper bound on overall о for 3, 4, and 5 way ANOVAs 
if .05 level is used for each effect: 


Three Way Four Way Five Way 
Number of hypotheses being tested 7 15 31 
Upper Bound on Overall œ 30 536 79 


The results of the Chase study mentioned previously provide a perfect illustra- 
tion of the danger. In that study the focus was on looking for interactions, but no 
specific interactions were hypothesized a priori to be significant. However, two 
significant higher order interactions were found at the .05 level, and these were the 
only significant results. The cell size was equal so that the overall а = .536. But this 
is saying that the probability of at least one false rejection is uncomfortably high. 
Thus, the two significant results found could very well be type I errors or spurious 
results. At the very least, the reader should be warned of this possibility. 

A simple way of controlling the escalating overall œ level for 3 and 4 way de- 
signs is to test each effect at a more stringent о level, say & = .01. Then we аге as- 
sured that overall о € .07 for the 3 way and overall о € .15 for the 4 way design be- 
cause of the Bonferroni inequality. But the price of this is even worse power for 
detecting interactions. Now, if sample size is large enough, then we can have the 
luxury of setting & = .01 and still have adequate power. For example, if n = 270 in a 
2 x 3 x 3 design, then power will be good to adequate for detecting all effects (al- 
though marginally so for the BC and ABC interactions). 


An Improved Bonferroni Type Procedure 


Holland and Copenhaver (1988) discuss several new and improved competitors to 
the Bonferroni procedure. The one we consider here and illustrate is due to Holm 
(1979). For this procedure one needs the p values (tail probabilities) for each hy- 
pothesis being tested, but since these are printed out on the major statistical pack- 
ages this is no problem. 

The general problem then is to keep overall a under control when testing a set of 
k hypotheses. The k hypotheses could be of a variety of forms: (1) the numerous F 
tests from a complex factorial design, (2) the numerous t tests involved if one com- 
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pares two groups on large set of dependent variables, (3) examining a large number 
of 2 x 2 contingency tables from say an original 5 x 7 contingency table, and (4) 
determining which of 50 individual between correlations are significant, in analyz- 
ing the association between two sets of variables (5 in one set and 10 in the other 
set). 

In the Holm procedure the p values for the k hypotheses are ordered from small- 
est to largest: pay < ро) < = € рад. Tied p values can be ordered arbitrarily. 

Let Ha), . . ., Hæ denote the hypotheses corresponding to these ordered p val- 
ues. Suppose i* is the smallest integer from 1 to k such that 


pa > (К i* + 1) 


Then the Holm procedure rejects H1), ..., Нот) and retains На), . . ., Hag. The 
increased power for the Holm procedure comes from the fact that o/(k — i + 1) is 
larger than ovK. 

To illustrate the use of the Holm procedure consider a hypothetical 4 way 
ANOVA. Suppose we wish to control overall œ at .10 for the k = 15 hypotheses and 
that the ordered p values are: 


i P(i) ov(k-i + 1) 
1 0001 .0067 
2 .0001 ‚0071 
3 0017 0077 
4 0046 0083 
2 0053 0091 
6 ‚0078 0100 
7 0094 0111 
8 0113 0125 
9 0435 0143 
10 0896 ‚0167 
11 11342 0200 
12 2689 0250 
13 4625 0333 
14 5813 0500 
15 6437 . 1000 


We see here that i* is 9, where p(i) = .0435 > .0143. Thus, by the Holm ргосе- 
dure we would declare 8 effects in the design significant, with assurance that over- 
all а = .10. If Bonferroni had been used, an effect would need ар < .067 to be de- 
clared significant, and only 5 effects would have been significant. 
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4.5 A COMPREHENSIVE COMPUTER EXAMPLE 
USING REAL DATA 


To tie together several elements discussed in this chapter, we consider a computer 
analysis of the CARTOON data set. In this study an instructional slide presentation 
(18 slides) was developed, with the topic being the behavior of people in a group 
situation, and in particular the various roles or character types that group members 
often assume. Each role was identified by an animal. Each animal was shown on 
two slides, once in a cartoon sketch and once in a realistic picture. A random half of 
the 179 subjects saw the slides in black and white and the other half saw the slides 
in color. The subjects were immediately posttested for the number of cartoon char- 
acters they could identify (CARTOON 1) and for the number of realistic characters 
they could identify (REALI). They were retested 4 weeks later on the same two 
variables. Three groups of subjects were involved in the study: preprofessional and 
professional personnel from three hospitals and a group of Penn State college stu- 
dents. For the computer analysis, in contrast to the first edition of this text, I just 
consider the REAL2 variable for this 2 x 3 design (color of presentation by type of 
subject). Our analysis serves to illustrate and integrate several aspects of practical 
data analysis. 

The analysis was run on SPSS MANOVA, and only on subjects for which there 
is complete data. Note that that this reduces the effective sample size substantially 
(from 179 to 105). Immediately we encounter a couple of problems that typify 
"real world" data analysis. First, the cell sizes are sharply unequal. Second, there is 
a fair amount of missing data (many subjects did not show up for the retest). 
Missing data is a fairly common occurrence in certain areas of research, and there 
is no simple solution for this problem. If it can be assumed that the data is missing 
at random, then there is a sophisticated procedure available for obtaining good es- 
timates (Johnson & Wichern, 1988, pp. 197—202). On the other hand, if the random 
missing data assumption is not tenable (usually the case), then there is no general 
consensus as to what should be done. There are various suggestions, like using the 
mean of the scores on the variable as an estimate, or using regression analysis 
(Frane, 1976). Probably the “best” solution is to make every attempt to minimize 
the problem before and during the study, rather than having to manufacture data. 
The statistical packages SAS and SPSS have various ways of handling missing 
data. The default option for both, however, is to delete the case if there is missing 
data on any variable for the subject (called listwise deletion). 

Now recall that the homogeneity of variance assumption for factorial designs is 
that the cell population variances are equal. Since the cell sizes are sharply un- 
equal, a violation of this assumption will distort the type I error rate, and it is im- 
portant to check this assumption. Fortunately this assumption is tenable (using 
Cochran's test, p = .401). 
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TABLE 4.8 
SPSS MANOVA Control Lines and Selected Printout for CARTOON Data 


TITLE ‘TWO WAY ANOVA ON CARTOON DATA FOR REAL2’. 

DATA LIST FIXED/ID 1-3 COLOR 5 ED 7 LOCATION 9 OTIS 11-13 CARTOON1 
15 REAL1 17 CARTOON2 19 REAL2 21. 

BEGIN DATA. 


DATA (FROM 3.5 FLOPPY DISK) 


END DATA. 
MANOVA REAL2 BY COLOR(0,1) ED(0,2)/ 
PRINT = CELLINFO (MEANS) /. 


Tests of Significance for REAL2 using UNIQUE sums of squares 
Source of Variation ss DF MS F Sig of F 
WITHIN CELLS 532.74 99 5.38 

COLOR 15.04 1 15.04 2:79 .098 

ED 79.95 2 39.98 7.43 .001 
COLOR BY ED 12.62 2 64.32. 1.17 .314 
(Model) 95.89 5 19.18 3.56 .005 
(Total) 628.63 104 6.04 

R-Squared - 253 

Adjusted R-Squared = .110 


The significant main effect for REAL2 is an overall test that merely tells us ће 
three population column means differ. It does not indicate which particular column 
means are different. For this we need a post hoc procedure, just as we did in one 
way ANOVA. We use the Tukey procedure. Recall that for one way ANOVA the 
endpoints for the confidence intervals were given by 


(Xi — Xj) = do N-k N MS, / n 


where n was the assumed common group size. Remember also that when the group 
sizes were unequal the Tukey was still applicable, provided that the population 
variances were equal and that n was replaced by the harmonic mean for each pair 
of groups. 

In application of the Tukey to factorial designs the п is replaced by the number 
of observations on which each row or column mean is based, for equal cell size. 
When the cell sizes are unequal, as in the study we are examining, we again em- 
ploy the harmonic mean, but now for each pair of row and/or column sizes. 

Below we present the cell sizes for the REAL2 variable. From this table we see 
that the column sizes are 24, 26, and 55. 
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Real2 
Preprof Prof Coll Row Mean 
3.667 3.937 4.852 4.33 
(2.43) (2.05) (2.23) 
п=12 п = 16 п= 27 
2.333 2.700 4.964 3.88 
(1.97) (1.89) (2.73) 
п= 12 п= 10 п = 28 
3.0 3.46 4.91 4.11 


Thus, the harmonic means for each pair of column sizes are given by 
2(24)(26)/50 = 25, 2(24)(55)/79 = 33.42 and 2(26)(55)/81 = 35.31 


Below are the calculations and the intervals: 


Groups Harmonic 

Compared Mean Critical Value Interval 
Prof-Preprof 25 3.356V5.381/ 25 =156 (1.1, 2.02) 
Mean diff = .46 

Coll-Preprof 33.42 3.35645.381/ 33.42 = 1.35 (.56, 3.26) 
Mean diff = 1.91 

Coil-Prof 35.31 3.356/5.381/ 35.31 = 1.31 (.14, 2.76) 


Mean diff = 1.45 


These intervals show that the college students differed significantly from both 
the preprofessional and the professional groups, and examination of the column 
means in the above table shows that the college students scored higher in each case. 


4.6 POWER ANALYSIS 


Power Estimation for Two Way Analysis of Variance 


Because we are basing our treatment of power on Cohen’s book (1977, revised edi- 
tion), itis very important to note here that estimation of power for factorial designs 
changed significantly from the first edition (Cohen, 1969) to the second edition. 
We quote from a couple of footnotes in Cohen’s (1977) edition: 


Readers familiar with the first edition should note that the treatment of main effects 
(and even more so, of interactions) in factorial design differs considerably here. The 
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systematic overestimation of power for main effects by the former method proved to 
be unacceptably large in some applications. The present method gives quite accurate 
and unbiased results.... In the case of interactions, both the ES (effect size measure) 
of formula (8.3.6) below and the п used for table entry have been changed in this edi- 
tion, thus avoiding substantial underestimation of power for interactions, (footnotes 
2 and 3, p. 364 and p. 369) 


The main reason for setting up a factorial design is to test for an interaction ef- 
fect. Unfortunately, as we will see shortly, the power for detecting this interaction 
can be inadequate. Suppose that we have an A x B design, with r levels for factor A 
and c levels for factor B. The effect sizes for the main effects and interaction may 
be expressed as follows: 


A main effect: f4./(r—1)F4 /N (5) 
B main effect: fg (c —1)Fg /N (6) 


AB interaction: fag (r —1)(c — D) Fan / N (7) 


To illustrate the power calculations we consider a 2 x 3 design with 10 observa- 
tions per cell: 


B 
10 10 10 30 
А 
10 10 10 30 
20 20 20 60=N 


It might appear that the n we should use to enter the power tables for the a main 
effect is 30, since the row mean is based on 30 subjects. However, a slight adjust- 
ment is necessary (Cohen, 1977, p. 365). The same is true for the B main effect and 
the interaction, and the following ns are what are needed: 


Effect n used to enter the table 
A main effect nA =[N-rc)/r] + 1 
B main effect пв =[N-rc)/r] +1 


AB main effect пав = [(М— тс)/((т— 1)(с—1)+1)] +1 
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The F ratios from the above study, along with the corresponding effect sizes and 
power are presented below: 


Effect F Effect Size n to enter table Power 
A main 1.8 41(18) / 60 = .173 (60 – 6)/2 + 1 = 28 25 
В main 3.6 42(36) / 60 = .346 (60 – 6)/3 + 1 = 19 .64 
AB inter 21 \/2(22) / 60 = .265 [(60 – 6)/2 + 11 + 1219 40 


Since there is a fairly large effect size for the B main effect, power is at least fair. 
However, for the interaction with a medium effect size, power is poor. 


Power Estimation for Three Way Analysis of Variance 


Power analysis for three way ANOVA is a straightforward generalization of that 
for two way ANOVA. Consider a general three way design: factor A—a levels, fac- 
tor B—b levels, and factor C—c levels. We simply need the following effect size 
for the three way interaction: 


ў = бавс / 6, where 62 = MS,, and Gagc = SSagc / М 


It can be shown that/is related to the F statistic for the three way interaction as 
follows: 


Аа — DO —1Xc — 1)/ N]Fasc (8) 


In the above, и = (a – 1)(b—1)(c — 1) is the degrees of freedom for the three way 
interaction. The n that is used to enter the power tables for each effect in the design 
(1.е., main effects, first order interactions and the second order interaction) is 


п! = (М —abc)K(u-- 1) +1 


where (N — abc) is the degrees of freedom for the error term, and и is the degrees of 
freedom for the effect in question: For example, (a — 1) for the a main effect, (b — 1) 
for the B main effect, (a — 1)(b — 1) for the AB interaction, etc. 

To illustrate power estimation we consider a 2 x 3 x 3 design with 5 subjects per 
cell. For example, the design might be sex by treatments by social class. The power 
values are given for differing o and for different effect sizes. 
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Power as a Function of Effect Size & a Level 
in a 2 x 3 x 3 Design with n= 5 
{= ло ј==5 f= 40 
Effect и п' 05 10 ‚05 10 05 10 


А 1 37 .22 .70 193 96 
В 2 25 19 .60 .87 93 
С 2 25 19 .60 87 93 
AB 2 25 19 ‚60 ‚87 93 
AC 2 25 19 60 87 93 
BC 4 16 16 31 81 88 
ABC 4 16 16 .51 81 88 


The power values аі о = .05 for small and medium effect sizes are boxed in. 
Notice that almost all (only | exception) these values are less than .50, that is, poor. 

In Chapter 3 on power analysis we indicated that SPSS MANOVA can be used 
to obtain estimates of power for various fixed effects univariate and multivariate 
tests, and showed in Table 3.2 the control lines for obtaining the power estimates 
for the t test and а one way ANOVA. To obtain power estimates for the various ef- 
fects in a factorial design is equally as simple. For example, for the data on page 
174, we simply insert after the MANOVA command the following subcommands 


POWER = F(.05) / 
PRINT = CELLINFO(MEANS) SIGNIF(EFSIZE) / 


Directly beneath the ANOVA table SPSS prints out the effect sizes and power 
estimates for each effect in the design. It looks like this: 


Placeholder for art p. 192 of previous edition 
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Recall from Chapter 3 that a partial n? around .01 indicates a small effect size, a 
partial n? around .06 a medium effect size, and a partial n? round .14 was a large ef- 
fect size. For the above design there are three large effect sizes (for age main effect, 
the sex by treat interaction, and for the three way interaction). Recall that the above 
two interaction effects were significant at the .05 level, while the age main effect 
was not significant. Here, because of the very small sample size (only 3 subjects 
per cell), power was only adequate (around .70) when the effect size was very 
large. 


47 FIXED AND RANDOM FACTORS 


At this point it is important to distinguish between fixed and random factors. All 
that we have considered to this point is what is called fixed effects ANOVA. For ex- 
ample, in comparing three different diets (one way ANOVA), the diets are not ran- 
domly sampled from some population of diets, but rather they are fixed by the ex- 
perimenter. Furthermore, the experimenter is not interested in generalizing to 
some population of diets but wishes to determine which of the diets in the study is 
superior to the others. Thus, inferences in the study are "fixed" or limited to the di- 
ets under consideration. There are situations in factorial designs where the experi- 
menter may wish to generalize beyond the given levels of a factor in the study, and 
in this case the factor is considered random. Let us consider two examples to illus- 
trate. 

First, suppose we want to compare three different teaching methods (fixed fac- 
tor) in 7 randomly selected schools in some metropolitan area. The investigator 
wishes to generalize to the population of schools in this area. Schools is the ran- 
dom factor and we have what is called in the literature a mixed model, since one 
factor (methods) is fixed while the other factor is random. 

As a second example suppose we are comparing the effect of two reading meth- 
ods on comprehension for second graders. We select 5 stories that we consider to 
be representative of second grade reading material. We have a 2(methods) x 5(sto- 
ries) design. We wish to generalize the results to all stories, so that stories is the 
random factor. 

A random factor(s) in the design introduces another complication; different er- 
ror terms (something other than М5,,) are needed for testing some of the effects for 
significance. For instance, for the teaching methods by schools example, while 
MS, is appropriate for testing the school main effect and the interaction effect for 
significance, the method x school interaction mean square is the appropriate error 
term for testing the method main effect. 
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4.8 SUMMARY 


1. In two way ANOVA we are examining the effect of two independent vari- 
ables (factors) on some dependent variable. 

2. For an A x B design there аге 3 hypotheses to be tested: The a main effect 
(that the population row means are equal), the B main effect (that the population 
column means are equal), and the A x B interaction effect. 

3. An interaction means that the effect one factor has on the dependent variable 
is not the same for all levels of the other factor. Two types of interaction, ordinal 
and disordinal, were discussed. 

4. The same error term, MSw, is used for testing each of the 3 effects. It is a 
pooled estimate of within cell variability, and for equal cell size is just the average 
of the cell variances. 

5. For balanced designs (equal cell п) the sums of squares are independent, al- 
though the F tests are not independent because they share a common error term. 
However, for total еуеп moderately large the amount of dependence is small and 
can be ignored for practical purposes. 

6. For disproportional cell size the sums of squares are correlated (con- 
founded). Several methods have been suggested in the literature for analyzing such 
designs. There is now a consensus that generally the regression approach, where 
the unique contribution of each effect is obtained, should be used. If, however, an a 
priori ordering of the effects can be established, then the sequential sum of squares 
approach makes sense. 

7. The regression approach, which yields the unique variation due to each ef- 
fect, is denoted by type Ш sum of squares in SAS and SPSS for Windows. 

8. Aptitude x treatment interaction (ATI) is a broad area of research that uses 
factorial designs, and is concerned with the possible moderating effect any individ- 
ual difference characteristic (sex, age, locus of control, etc.) of subjects may have 
on their response to treatments. 

9. Inathree way ANOVA there are 7 hypotheses that are tested. The 3 main ef- 
fects test whether the population level means are equal. The nature of the two way 
interactions is ascertained by examining the means for each pair of factors lumped 
over the third factor. The three way interaction indicates whether the patterns of 
means (profiles) for any two factors are different for the levels of the third factor. 

10. Power estimation for two and three way ANOVA was discussed using Co- 
hen’s approach. 

11. An improved Bonferroni type procedure, which makes use of p values, is 
discussed and illustrated. 

12. A comprehensive computer example, using real data, is used to illustrate 
and integrate several important concepts from the chapter, as well as indicating 
some aspects of practical data analysis. 
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13. For 3 and 4 way ANOVA there are many hypotheses being tested (7 for 
three way and 15 for four way). It is important to note that if the .05 level is used for 
each effect, then the overall a level becomes quite high. Thus, | or 2 significant re- 
sults from such a design, if not hypothesized a priori, could well be spurious. 

14. The distinction between fixed and random factors is illustrated with some 
examples. 


EXERCISES 
1. Can you think of a fourth advantage of a factorial design? 


2. Consider the following hypothetical data for an Age by Treatments facto- 
rial design: 


TREATMENTS 
1 2 3 
10 yrs 21, 27,23 24, 32, 30 19, 30, 27 
28, 20 35,32 20,21 
AGE 
12 yrs 18, 25, 27 24, 16, 18 34, 28, 21 
20, 23 19, 20 30, 29 


(a) Testeach of the effects for significance at the .05 level using the defi- 
nitional formulas given in the text. Use your calculator to obtain the mean 
and variance for each cell and then go from there. 

(b) Which of the effects, if any, are significant? Interpret any effect which 
is significant. 


3. Run problem 2 on the SAS GLM program. 
4. Suppose that a study like that for problem 2 had been conducted, starting 


with 5 subjects per cell, but that for various reasons several subjects 
dropped out of the study leaving the following disproportional cell size 


data set: 
TREATMENTS 
1 2 3 
10 yrs 21,27,23 24, 35, 32 19, 30, 
28, 20 20, 21 
AGE 
12 yrs 25, 20 24, 16, 18 28, 21 


19, 20 30, 29 
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(a) Run this data set on both SAS GLM and on SPSSX MANOVA. 
Which effects are significant at the .05 level? Interpret any significant ef- 
fects. 

(b) Are the Type I and Type Ш sums of squares different for all the ef- 
fects? Explain. 


5. Consider the following results from a 2 x 3 factorial ANOVA (4 subjects 
per cell) study by Pukulski (The Reading Teacher, 1970, 515-522): 
Source of Variation df MS F 
Sex 1 308.16 528 
Reinforcement 2 3251.29 5.577 
Sex by Reinforcement 2 1094.29 1.87 
Error 18 583.78 
жр<.05 
(a) Estimate what his power was at о = .10 for detecting the reinforce- 
ment main effect? 
(b) Estimate power at & = .10 for detecting the interaction effect. 
(c) Given the result in (b), what would you recommend Pukulski do in a 
followup study? 
6. Explain what Cronbach meant when he said, “Once we attend to interac- 
tions we enter a hall of mirrors that extends to infinity.” 
7. Suppose an investigator in a heuristic study has a two way design and 5 de- 


pendent variables. He runs 5 univariate two way ANOVAs, that is, he does 
a two way ANOVA on each dependent variable separately. Four of the ef- 
fects are significant at the .05 level, and he is excited by these results and 
discusses them in some detail. 

(a) What is the total number of statistical tests that was done here? 

(b) What is the upper bound on overalla? 

(c) Given the result in (b), should the investigator be excited, or should he 
be cautiously optimistic? 


. Consider again the method by teacher by sex example on p. 150. 


(a) Suppose that only the AC interaction was significant. Interpret what 
this result means. Give a pattern of means that is congruent with the above 
result. 

(b) Suppose that the A and C main effects and the AC interaction were the 
only significant effects. Interpret these results, and give a pattern of means 
that is congruent with these results. 


10. 


11. 


12. 


13. 
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In Section 4.4 we indicated, using definitional formulas, how various sums 
of squares for a three way ANOVA are calculated. Finish the calculations 
for that example by 

(a) calculating the age by treatment sum of squares 

(b) calculating the sex by age sum of squares 

(c) calculating the age main effect sum of squares 


Construct the treatment by race profiles for example | on p. 149 and inter- 
pret. 


An investigator has a 3 x 3 (treatments by social class) factorial design and 
from previous literature anticipates a medium treatment main effect and a 
medium interaction effect. She wishes to know if having 10 subjects in 
each cell will yield adequate power (> .70) for detecting these two effects. 
Given these results, what is the estimated power for detecting the interac- 
tion at o& = .10? Is power now adequate? Obtain a somewhat rough estimate 
(since it involves extrapolation) of power at a = .15. 


Suppose an investigator actually hypothesized a significant three way inter 
action effect (don’t expect to find this very often in the literature). He 
wishes to detect a medium or larger three way interaction effect with power 
=.70 ata = .10. He has a 2 x 2 x 3 design. How many subjects per cell are 
needed? 


A study by Tuckman, Steber, and Hyman (1979) had principals rate teach- 
ers in their schools, whom they had previously nominated as effective or 
ineffective, on the four dimensions of the Tuckman Teacher Feedback 
Form: creativity, dynamism, organized demeanor, and warmth and accep- 
tance. There were 180 teachers rated, one-third each at the elementary, in- 
termediate, and high school levels. The primary question in the study, in 
the authors words was, “Do principals’ judgments across the four dimen- 
sions of teaching style vary from elementary to intermediate to senior high 
school principals? That is, do principals at the three levels perceive the four 
dimensions differently as elements of effective versus ineffective teach- 
ing?” They hypothesized elementary principals would see warmth and ac- 
ceptance and creativity as contributing most to the discrepancy between 
most effective and least effective teachers while dynamism and organized 
demeanor were expected to be higher in importance for intermediate and 
high school principals. 

To test their hypothesis, four two way ANOVAs were run, a separate 
ANOVA for each dimension of the TTFF. The independent or grouping 
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variables are school level and effective-ineffective dimension. The follow- 
ing results were obtained: 


WARMTH 
ORGANIZED AND 
CREATIVITY DYNAMISM DEMEANOR ACCEPTANCE 
SOURCE DF MS Е MS F MS F MS F 


SCHOOL 


The asterisks indicate those effects with a p value less than .01. 
Also, the following means were obtained on the four variables for the six 
cells in the factorial design: 


ORGANIZED WARMTH AND 


CREATIVITY DYNAMISM DEMEANOR ACCEPTANCE 
М.Е. L.E. M.E. L.E. M.E. L.E. M.E. L.E. 
ELEM. 27.3 22.4 25.7 28.9 34.8 27.9 39.3 23.9 
INTERM. 292 218 27.9 22.8 36.8 27.0 35.6 26.5 
SENIOR 24.9 15.9 28.2 17.6 36.3 24.4 31.7 26.7 
(a) What are the significant interaction effects on DYNAMISM and 
WARMTH AND ACCEPTANCE telling us? 
(b) Explain why these interaction effects occurred, using the cell means. 
(c) What part(s) of their hypotheses are confirmed by the above analysis? 
(d) Is further analysis necessary to validate or invalidate some of their hy- 
potheses? 
14. A study by Bryan (1974) investigated the peer popularity of learning dis- 


abled children. The learning disabled and a sample of control “normal” 
subjects each consisted of 35 white and 29 black boys and 10 white and 10 
black girls. The children were in grades 3, 4, and 5. A combination of two 
sociometric techniques was used to assess peer popularity. The measures 
included: (a) the choice of three classmates as friends, classroom neigh- 
bors, and invitees to a birthday party; (b) the choice of three classmates 
who are not friends or neighbors or invitees to a birthday party; and (c) the 
Guess Who Technique. Sample items from this procedure include: “Who 
finds it hard to sit still in class? Who is handsome or pretty? Who is always 
worried or scared?” the scores of the children on items from the above three 
categories were the sum of the number of classmates who nominated the 
subject on that item, divided by the total number of votes cast within the 
classroom. The relationships among the items indicated that the 20 items 
could be divided into two scales: social acceptance and social rejection. 
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These were the two dependent variables for the study. The percentages on 
these two variables were transformed into arc sine equivalents before anal- 
ysis. This transformation is appropriate when there is reason to believe 
there may be a relationship between the means and the variances. Such is 
the case when the dependent variable involves proportions or percentages 
(see Myers, 1979, p. 73), as in this study. Subjects were cross classified on 
group (learning disabled or control), sex, and race, and three way 
ANOVA's were run on each dependent variable, using a least square analy- 
sis. The following results were obtained: 


Social Acceptance Social Rejection 
F F 
Mean Sq df = 1,160 Mean Sq df = 1,60 

Group (A) .809 19.896*** 589 9.118** 
Sex (B) .149 3.667 .004 .055 
Race (C) .000 .008 .007 112 
АхВ 032 797 313 4.850* 
АхС 233 5.737** 932 14.415*** 
ВхС 001 019 029 447 
АхВхС 094 2.320 173 104 

*p < .05 

**p < .01 

***p < 001 


(a) Why is the degrees of freedom for error = 160? 

(b) What is the numerical value of the error term for social acceptance 
and social rejection? 

(c) The cell sizes in the study were unequal, but the exact cell sizes were 
not reported. Given this, should the author have checked the homogeneity 
of variance assumption? Why, or why not? 

(d) The author presents the following table for interpreting the signifi- 
cant AC interactions for social acceptance and rejection: 


SOCIAL REJECTION SOCIAL ACCEPTANCE 

WHITE BLACK WHITE BLACK 
LEARNING DISABLED 15 8 4 6 
CONTROL 5 9 10 7 


What type of interaction (ordinal or disordinal) resulted for social rejec- 
tion? for social acceptance? Does there seem to be a particular cell that is 
primarily responsible for the interactions in each case? 

(e) Calculate the effect size for the a x B x C interaction on social accep- 
tance. 
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Is this a practically significant effect which we failed to detect because of 
inadequate power? 


Run a three way ANOVA on the data given below for a sex(2) x age(2) x 
treat(3) design. 


TREAT 
SEX AGE 1 2 3 
1 19,16,18,17 23,24,25,28 16,12,24,10 


2 20,17,18,19 27,31,28,25 19,18,23,27 
1 17,18,14,22 26,19,13,17 15,17,15,12 


2 13,18,20,19 14,13,21,18 14,18,19,11 


(a) Test each of the effects at the .01 level. Which are significant? 
(b) Interpret any significant effects using the appropriate means. 


Consider the following subset of data (all of site 1) for a SETTING (1 for 
home and 2 for school) by VIEWCAT (1 for rarely watches Sesame St to 4 
for watches the show on average of more than 5 times a week) factorial de- 
sign. The dependent variable is LETDIFF = POSTLET – PRELET, that is, 
a measure of how much the children have gained in their knowledge of let- 
ters: 


VIEWCATI VIEWCAT2 VIEWCAT3 VIEWCAT4 


HOME 0,4 6,4,10,9 14,7,7,28,16,32 27,21,4 


11,1,-2,6,-10 8,-1,10,26,-22 24,35,5 


SCHOOL 7, 3, 8, –1 -1,4, 17,6 11, 32, 10, 33, 14 6, 6, 4, 4 


17. 


2,-1,-1 9,21,5,7 33,30,31,5 24,-15,8,7 


(а) Runatwo way ANOVA оп this data using either SAS or SPSS. For 
both the default is the unique variability due to each effect (called type III 
sum of squares). Which effect(s) is(are) significant at the .05 level? 

(b) Using the appropriate means, interpret each of the significant effects. 


Consider the following approximate 33% random sample of the 
ATTITUDE data. Here we focus on the SEX and GRADE factors and the 
change in mathematics attitude (CHGMATH). 
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GRADE 
Э 4 5 6 
MALE –1,0, 2,2 0, -3, 0, -1 1,1,1,-1 1,2,0, 3,1 
-1,0,2,2 3,-5,0,3 1,-4,1,0 
FEMALE 3, -2, –2, 2, -2 1,-1,-3,0 -1,-1,0,0 -1,1,1,-4,-1 
-3,-1,0,0,-1,1 2,0,0,0 0,0,-1,0,0 0,4,1,-1,0 


(a) Using either SPSS or SAS, test each of the effects for significance at the 
.05 level. Which, if any, are significant? 


18. Run the HEADACHE data on SPSS ог SAS, using ОМСОМЕ as the de- 
pendent variable. What effects are significant at the .05 level? 


19. Why is it indicated that with real data one will generally have unequal cell 


size? 


20. Ina 5 way ANOVA, how many sources of variation are there? 


APPENDIX 
DOING A BALANCED TWO WAY ANOVA 
WITH A CALCULATOR 


Obtain the mean and variance for each cell. 

Obtain the row, column, and grand means. 

Obtain the error term (MS\,) as the average of the cell variances. 
Obtain the sum of squares and mean squares for the main effects. 
Test each of the main effects for significance. 

Obtain the sum of squares and mean square for the interaction effect. 
Test the interaction effect for significance. 


SO БУ БӘРӘ ды 


To illustrate the above process, we consider the following age by treatment de- 
sign: 


Treatments 
1 2 3 
10 yrs 21, 27, 23 24, 32, 30 19, 30, 27 
28, 20 35,32 20,21 


AGE 
12 yrs 18,25,27 24, 16, 18 34, 28, 21 
20, 23 19, 20 30, 29 


The means, cell variance, and the row, column, and grand means are as follows: 


TREATS ROW MEANS 
23.8 30.6 23.4 25.93 
(12.7) (16.8) (23.3) 
AGE 
22.6 19.4 28.4 23.47 
(13.3) (8.8) (22.3) 
COLUMN MEANS 23.2 25 25.9 24.70 | (GRAND MEAN) 


Now we move to step 3 and obtain the error term: Recall from the chapter that 
for equal cell size the error term is just the average of the cell variances. Therefore, 


MS,, = (12.7 +16.8 + 23.3 + 13.3 - 8.8 + 22.3)/6 = 16.2 


In step 4 we obtain the sum of squares and mean squares for the main effects 
(note that cell size = 5): 
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SSage = 15[(25.93 — 24.7)? + 23.47 —24.7)2)] = 45.39 

MS age = 45.39/1 = 45.39 

SS, = 10[(23.2 24.7)2 + (25 — 24.7)? + (25.9 24.7)2 
= 37.8 

MSs = 37.8/2 =18.9 


In step 5 we test the main effects for significance (we use the .05 level here). 
Fage = MSage / MSw = 45.39/16.2 = 2.80 


Since the critical value at the .05 level, based on 1 and 24 degrees of freedom, is 
4.26, we fail to reject. 


Forts = Мб» / М8, = 18.9/16.2 = 1.17 


Here the critical value, based on 2 and 24 df, is 3.40, and we once again fail to 
reject. 

In step 6 we obtain the sum of squares and mean square for the interaction ef- 
fect. Recall that the sum of squares for interaction involved cell interaction effects, 
and that each cell interaction effect, in words, is given by cell interaction = cell 
mean + grand mean — row mean – column mean. The cell interaction effects аге 
given below: 


TREATMENTS 
-.63 4.37 -3.74 
АСЕ 
.63 -4.37 3.74 


Therefore, 


55ш = 5[(—.63)? + (4.37. +(—3.74)2 -- (63 + (—4.37)2 +(3.74)2] 
— 334.814 
MSint = 334.814/2 +167.407 


Finally, in step 7 we test the interaction effect for significance: 
Ем = М8, / MS, = 167.407/16.2 = 10.33 


The critical value is 3.40. Thus, there is a significant interaction effect. 
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5.1 INTRODUCTION 


In our discussion of one way ANOVA and factorial ANOVA the subjects were only 
measured once on the dependent variable. In this chapter we consider designs that 
measure subjects several times, either on the same dependent variable or on differ- 
ent measures. The simplest repeated measures design measures the subjects twice, 
with an intervening treatment. Schematically, we have 


Pretest Treatment Posttest 


In this case the student may recall that the г test for correlated (dependent) sam- 
ples applies. Repeated measures analysis of variance (where the subjects are mea- 
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sured more than twice) is the generalization of the t test for correlated samples, just 
as ANOVA (k groups) was the generalization of the f test (two groups) for inde- 
pendent samples. 

There are many situations in which repeated measures are either appropriate or 
the natural thing to use. For example, if we are concerned with performance trends 
over time. Bock (1975) presented an example comparing boys’ and girls’ perfor- 
mance on vocabulary acquisition over grades 8 through 11. Here the focus is often 
on the mathematical form of the trend, that is, whether it is linear, quadratic, etc. 
The same type of analysis applies whether we are concerned with cognitive vari- 
ables (as above), or personality changes for a group of subjects over time, or devel- 
opmental (physiological) changes for a group of infants (children). 

Another class of repeated measures situations occurs when we are comparing 
the same subjects under several different treatments (drugs, stimulus displays of 
different complexity, etc.). For example, we may be interested in the effects of 4 
drugs on reaction time for a group of subjects, or in the effects of repeated practice 
(say over 3 sessions) on a learning task. 

Another useful application of repeated measures occurs in combination with a 
one way ANOVA design. In a one way design involving treatments the subjects are 
posttested to determine which treatment is best. If we are interested in the lasting 
or residual effects of treatments, then we need to measure the subjects a few more 
times. Huck, Cormier, and Bounds (1974) present an example in which three 
teaching methods are being compared, but in addition the subjects are again mea- 
sured 6 weeks and 12 weeks later to determine the residual effect of the methods on 
achievement. A repeated measures analysis of such data could yield a quite differ- 
ent conclusion as to which method might be preferred. Suppose the pattern of 
means looked as follows: 


Posttest Six Weeks 12 Weeks 
Method 1 66 64 63 
Method 2 69 65 59 
Method 3 62 56 52 


Just looking at a one way ANOVA on posttest scores (if significant) could lead 
one to conclude that method 2 is best. Examination of the pattern of achievement 
over time shows however that for lasting effect method 1 is to be preferred, because 
after 12 weeks the achievement for method | is superior to method 2 (63 vs. 59). 
What we have here is an example of a method by time interaction effect. 

Another class of situations in which repeated measures designs apply is when 
the same subjects are given a series of tests or subtests. For instance, Glass and 
Hopkins (1984) present the following example. A group of 12 neurologically 
handicapped children are measured on the information, vocabulary, digit span, and 


REPEATED MEASURES ANALYSIS 183 


block design subtests of the Wechsler Intelligence Scale for Children (WISC). ТЕ 
the 12 fall into, say, 3 different types of neurological handicaps, then we may be in- 
terested in whether certain deficits on WISC are particularly associated with dif- 
ferent types of handicaps. Here a subject by subtest interaction is the focus. 

In this chapter we consider repeated measures designs of varying complexity. 
The simplest design involves a single group of subjects measured under various 
treatments (conditions), or at different points in time. Schematically, it looks like 
this: 


Treatments 
1 2 3 — k 


N 


Subjects 


We then consider a one between and one within design. Many texts use the 
terms “between” and “within” in referring to repeated measures factors. A be- 
tween variable is simply a grouping or classification variable such as sex, age, or 
social class. A within variable is one on which the subjects have been measured re- 
peatedly (like time). Some authors even refer to repeated measures designs as 
within designs (Keppel, 1983). An example of a one between and one within de- 
sign is 


DRUGS 


Schizophrenics 
Depressives 


Here the same schizophrenics and depressives are given three drugs to deter- 
mine which of them is best in inhibiting some undesirable response. The teaching 
methods study mentioned previously is another example of a one between and one 
within design, where methods is the between variable (different subjects taught by 
different methods) and time is the within variable. The reader should be aware that 
there are three other names that are used by some authors for the same design: 
Lindquist Type I, split plot, and two way ANOVA, with repeated measures on one 
factor. 

Next we consider a one between and two within design. As an example, sup- 
pose a researcher in child development is interested in observing three groups of 
children (ages 3, 4, and 5) in two situations at two different times (morning and af- 
ternoon) of the day. She is concerned with the extent of their social interaction, and 
will measure this by having two observers independently rate the amount of social 
interaction. The average of the two ratings will serve as the dependent variable. 
The age of the children is the grouping or between variable here. The two within 
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variables are situation and time of day. There are four scores for each child: social 
interaction in situation | in the morning, social interaction in situation 1 in the af- 
ternoon, social interaction in situation 2 in the morning, and social interaction for 
situation 2 in the afternoon. 

Schematically, the design is as follows: 


SITUATION 1 2 

TIME Morn. After Morn. After 

3 years у y2 уз y4 
AGE 4 years 

5 years 


where the ys represent the four social interaction measures for each subject. One 
can think of this as a three way ANOVA, but it is a different type of analysis of vari- 
ance from that in Chapter 4, because the subjects’ scores are correlated across situ- 
ation and across time, and this must be taken into account in the analysis. 
Finally, we discuss planned comparisons in repeated measures designs. 


5.2 ADVANTAGES AND DISADVANTAGES 
OF REPEATED MEASURES DESIGNS 


Recall that the two basic objectives in experimental design are elimination of sys- 
tematic bias and the reduction of error (within gp or cell) variance. The main rea- 
son for within group variability is individual differences among the subjects. One 
way of reducing error variance is considered in Chapter 7 on factorial designs, and 
that is by blocking оп a variable. One may block on sex, social class, І.О., etc. All 
of the variability between blocks is removed from the error term, yielding a more 
powerful test. In repeated measures designs, blocking is carried to its extreme. We 
are blocking on each subject. Thus, variability among the subjects due to individ- 
ual differences is completely removed from the error term. This makes these de- 
signs much more powerful than completely randomized designs, where different 
subjects are randomly assigned to different treatments. 

Another distinct advantage of repeated measures designs is that far fewer sub- 
jects are required for the study. For example, if three treatments are involved in a 
completely randomized design, we may require 60 subjects (20 per treatment). 
With a repeated measures design, we would need only 20 subjects. This can be a 
very important practical advantage in many cases, since numerous subjects are not 
readily available in some areas like counseling, school psychology, clinical psy- 
chology, and nursing. 

Although increased precision and economy of subjects are two distinct advan- 
tages of repeated measures designs, these designs have two potentially serious dis- 
advantages, unless care is taken. When several treatments are involved, the order 
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in which treatments are administered might make a difference in the subjects’ per- 
formance. Thus, it is important to counterbalance the order of treatments. For 
three treatments, counterbalancing involves randomly assigning one third of the 
subjects to each of the following sequences: 


Order of Administration of Treatments 


A B С 
В С А 
C A B 


Another potential disadvantage is the possibility of carryover effects. Thus, it is 
important to allow sufficient time between treatments to minimize carryover ef- 
fects, which could occur for example if the treatments were drugs. How much time 
is necessary is of course a substantive, not a statistical question. Keppel (1983) and 
Myers (1979) provide further discussion of the two above potential disadvantages. 


5.3 SINGLE GROUP REPEATED MEASURES 


To illustrate how the variance is partitioned for this simplest design we consider 
the following data set: 


Treatments 
Subjects 1 2 3 Means 
1 30 28 34 30.667 
2 14 18 22 18.000 
3 24 20 30 24.667 
4 38 34 44 38.667 
5 26 28 30 28.000 
Column Means 26.4 25.6 32 28.000 (grand mean) 


We analyze this data in two different ways: (1) as a completely randomized de- 
sign (pretending there are different subjects for the different treatments), and (2) as 
а univariate repeated measures analysis. The purpose of including approach | is to 
contrast the error variance that results against the markedly smaller error variance 
found with the repeated measures design. The reason we mention univariate re- 
peated measures analysis is because there is a multivariate approach that can be 
employed. We discuss and compare the univariate and multivariate approaches af- 
ter presenting this numerical example. 
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5.4 COMPLETELY RANDOMIZED DESIGN 


This simply involves doing a one way ANOVA, as was done in Chapter 2. Thus, we 
compute the sums of squares between (55) and the sum of squares within (SSw): 


SS, = 5[(26.4 — 28)? + (25.6 — 28)? + (32-28)? ] = 121.6 


55, = (30 — 26.4»? + (14 — 26.4)? +- - -+ (26 — 26.4)? treatment 1 
+ (28—25.6)2 + (18 — 25.6)2 +---+(28— 25.6) treatment 2 
+ (34 —32)2 + (22 — 32? +---+(30—32)2 treatment 3 
SS, = 734.4 


Now, we need the mean squares: 


MS, = SS, КК — 1) = 121.6/2 = 60.80 and 
MS, = SS, КМ — k) = 734.4/12 = 61.20 


Therefore, F = MS, /MS,, = 60.80/61.20 = .99, which is clearly not signifi- 
cant at the .05 level since we have more error variation than effect variation. 


5.5 UNIVARIATE REPEATED MEASURES ANALYSIS 


Notice that the mean responses for the subjects over the 3 treatments vary consid- 
erably (ranging from 18 to 38.667). We quantify this variability through the 
so-called sum of squares for blocks (55), where here we are blocking on subjects. 
The error variability that was calculated for the completely randomized analysis is 
split up into two parts, that is, SSy = 553 + SSres, where 55, stands for the sum of 
squares residual. Denote the number of repeated measures by k. Now we calculate 
the sum of squares for blocks: 


SSy = k-X(x;—xy 
= 3[(30.667 — 28)? + (18 — 28)? +---+ (28 — 28)? ] = 696.02 


Our error term for the repeated measures analysis is formed by subtracting the 
sum of squares for blocks from the sum of squares within, SS,4, = SS, — SS, = 
734.4 — 696.02 = 38.38. Note that the vast majority of the within variability is due 
to individual differences (696.02 out of 734.4), and that we have removed all of 
this from our error term. The variability that remains is due to within subject vari- 
ability over treatments. Now, 


MS yes = SS,4 (п — 1)(К —1) = 38.38/4(2) = 4.8 


and our F ratio for the repeated measures analysis is 
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Е = MS, / М5; = 60.80/4.8 = 12.67 


with (k — 1) 2 2 and (п– 1)(k — 1) = 4(2) = 8 degrees of freedom. This is significant 
at the .05 level (critical value = 4.46), in contrast to the F for the randomized analy- 
sis, which was less than 1. 


5.6 ASSUMPTIONS IN REPEATED MEASURES 
ANALYSIS 


The three assumptions for a single group univariate repeated measures analysis 
are: 


1. independence of the observations 
2. multivariate normality 
3. sphericity (sometimes called circularity) 


The first two assumptions are also required for the multivariate approach, but 
the sphericity assumption is not necessary. The reader should recall from Chapter 2 
that a violation of the independence assumption is very serious for independent 
samples ANOVA, and the same holds true for repeated measures analysis. 
Multivariate normality is somewhat difficult to characterize; however, it does re- 
quire normality on each of the individual measures. Recall again from Chapter 2 
that ANOVA was robust against non-normality. There is also a fair amount of evi- 
dence to suggest (Stevens, 1986, p. 207) that MANOVA is also robust against lack 
of multivariate normality, with respect to type I error. 

Before we specify what sphericity means, we wish to note that for many years it 
was thought that a stronger condition called uniformity (compound symmetry) 
was necessary. The uniformity condition required equality of the population vari- 
ances for all treatments and also that all population covariances be equal. Sche- 
matically for three repeated measures the uniformity condition looks like this: 


1 2 3 

16? o. Oc 

2 с, 62 б, 

3 о. б. o? 
In the above, 62 represents the common population variance for the three re- 
peated measures, and б. represents the common population covariance. Huynh 
and Feldt (1970) and Rounet and Lepine (1970) independently showed that sphe- 


ricity is an exact condition for the F test to be valid. Sphericity only requires that 
the variances of the differences for all pairs of repeated measures need to be equal. 
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Sphericity is a weaker condition than uniformity, and defines an additional class of 
situations where the univariate approach is valid. Consider the covariance matrix 
below: 


ж уз y 

10 05 1.5 
5 =10.5 3.0 2.5 

1.5 2.5 5.0 


The formula for the variance of the difference scores for the ith and jth repeated 
measures is given by 
ә); = s? 4-52 —2sij 
Now we calculate the variance for the differences for each pair of repeated mea- 
sures 


$55 =s? +53 —251; —14-3-2(5) 23 
52 + =1+5—20.5)=3 
52 4 =3+5—2(2.5)=3 


The variances аге the same for all difference variables, which means (ће sphe- 
ricity condition is met, even though uniformity is most definitely not satisfied (all 
the variances are unequal and the covariances are all unequal). 

The multivariate approach to repeated measures is valid for any covariance ma- 
trix for the repeated measures. 

Box (1954) showed that if the sphericity assumption is not met, then the F ratio 
for the univariate approach is positively biased (we are rejecting falsely too often). 
In other words, we may set our о; level at .05, but may be rejecting falsely 8% or 
10% of the time. The extent to which the covariance matrix deviates from spheric- 
ity is reflected in a parameter called ғ (Greenhouse & Geisser, 1959). We give the 
formula in Exercise 3 for those who are interested. Since € is printed out by SPSS 
and SAS, there is no need to go through all the tedious calculations. If sphericity is 
met, then 2 = 1, while for the worst possible violation the value of £ = 1 (k — 1), 
where k is the number of repeated measures. To adjust for the positive bias 
Greenhouse and Geisser suggest altering the degrees of freedom from 


(К —1) and (k — D( — 1) to 1 and (n —1) 


that is, dividing both degrees of freedom by (k — 1). 

Doing this makes the test very conservative, since adjustment is made for the 
worst possible case, and we don't recommend it. A more reasonable approach is to 
estimate £. Then adjust the degrees of freedom from 
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(k —1) апа (k — (и — 1) to &(k —1) and &(k —1)(n—1) 


Results from Collier, Baker, Mandeville, and Hayes (1967) and Stoloff (1967) 
show that this approach keeps the actual a very close to the level of significance. 

Huynh and Feldt (1976) found that even multiplying the degrees of freedom by 
ё is somewhat conservative when the true value of ғ is above about .70. They rec- 
ommended using the following for those situations: 


n(i—16—2 
(i—D[(n—1)—G—Dé 


The above Huynh epsilon can be printed out by both SPSS MANOVA and SAS 
GLM. 

There are statistical tests for checking sphericity, for example, the Mauchley 
test presented on SPSS. However, based on the results of Monte Carlo studies 
(Keselman, Rogan, Mendoza, & Breen, 1980), we don’t recommend using these 
tests. 


Е- 


5.7 SHOULD WE USE THE UNIVARIATE 
OR MULTIVARIATE APPROACH? 


In terms of controlling on type I error, there is no real basis for preferring the 
multivariate approach, since use of the modified (adjusted) univariate test (i.e., 
multiplying the degrees of freedom by €) yields an honest error rate. The choice 
then involves a question of power. Now assuming sphericity, the univariate test is 
more powerful. When sphericity is violated, however, the situation is much more 
complex. Davidson (1972) has stated, “when small but reliable effects are present 
with the effects being highly variable ... the multivariate test is far more powerful 
than the univariate test" (p. 452). And O'Brien and Kaiser (1985), after mentioning 
several studies that compared the power of the multivariate and modified 
univariate tests, state, “Even though a limited number of situations has been inves- 
tigated, this work found that no procedure is uniformly more powerful or even usu- 
ally the most powerful" (p. 319). Thus, given an exploratory study, we agree with 
Barcikowski and Robey (1984), who recommend that both the univariate and 
multivariate tests be routinely used because they may differ in the treatment effects 
that they discern. In such a study half the experimentwise level of significance 
might be set for each test. Thus, if we wish our overall о = .05, simply do each test 
at a = .025. 


190 CHAPTERS 


5.8 COMPUTER ANALYSIS ON SAS AND SPSS 
FOR EXAMPLE 


In Table 5.1 we present the complete control lines for running the single group re- 
peated measures example given in Section 5.3 on SAS GLM and SPSS MANOVA. 
Table 5.2 gives the means and standard deviations for the three repeated measures 
variables, and Table 5.3 presents selected, annotated output from SAS GLM. 


TABLE 5.1 
SAS and SPSS Control Lines for Single Group Repeated Measures 
SAS SPSS 
TITLE ‘SINGLE GP REPEATED MEASURES’ ; TITLE ‘REPEATED MEASURES’ . 
DATA SINGLE; DATA LIST FREE/Y1 Y2 Y3. 
@ INPUT SUBJ TREAT REAC ва; BEGIN DATA. 
LINES; 30 28 34 14 18 22 24 20 30 
91130122813 34 38 34 44 26 28 30 
211422 182 3 22 END БАТА. 
3 124 3 2 20 3 3 30 LIST. 
4138 4 2 3443 44 MANOVA Yl Y2 Y3/ 
5 1 26 5 2 28 5 3 30 @ WSFACTOR = TREAT(3)/ 
PROC PRINT; WSDESIGN/ 
PROC GLM; ӨФӨ ANALYSIS (REPEATED) / 
Ф CLASS SUBJ TREAT; PRINT - TRANSFORM CELLINFO (MEANS) 
MODEL REAC - SUBJ TREAT; SIGNIF (UNIV AVERF)/. 


(D In order to run the single group repeated measures оп SAS we treat it as a two way ANOVA, with 
subjects and treatments as the grouping variables and reaction as the dependent variable. 

Q The first two numbers of each block of three gives the cell identification. Thus, the first subject in 
treatment 1 (1 1) had a reaction score of 30, the second subject in treatment 3 (2 3) had a reaction score 
of 22, etc. 

G In the CLASS statement we list the classification or grouping variables, which here are subject 
and treatment. 

@ The WSFACTOR (within subject factor) and the WSDESIGN (within subject design) are funda- 
mental to running repeated measures analysis on SPSS MANOVA. In the WSFACTOR subcommand 
we specify which are the repeated measures, or within subject, factors. And we indicate, in parentheses, 
the number of levels for each repeated measures factor. The WSDESIGN specifies the design on the re- 
peated measures. Here it is simply the treatment effect. 

© The TRANSFORM part of the PRINT subcommand prints out in columns the uncorrelated, 
transformed variables that are created by the program for the multivariate approach (cf. Stevens, 1986, 
Chapter 13). 

© The UNIV is necessary to obtain the significance tests for each of the transformed variables cre- 
ated by the program for the multivariate approach to repeated measures. The AVERF yields the unad- 
justed, overall univariate test for repeated measures. 
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TABLE 5.2 
Means and Standard Deviations for the Drug Data 


Placecholder for art from p. 213 of previous edition 


5.9 POST HOC PROCEDURES IN REPEATED 
MEASURES ANALYSIS 


As in a one way ANOVA, if an overall difference is found, one would almost al- 
ways want to determine where the differences lie. This involves a post hoc proce- 
dure. There are several reasons for preferring pairwise procedures: (1) they are eas- 
ily interpreted, (2) they are quite meaningful, and (3) some of these procedures are 
fairly powerful. The Tukey procedure is appropriate in repeated measures analysis, 
provided that the sphericity assumption is met. For the drug example this assump- 
tion is tenable. We apply the Tukey there, setting overall о = .05. Thus, we take at 
most a 5% chance of one or more false rejections. Recall that we discussed the 
Tukey procedure in Chapter 2. Remember that the studentized range statistic (de- 
noted by q) is used in the procedure. If there are k groups and the total sample size 
is N, then any two means are declared significantly different at the .05 level if the 
following inequality is satisfied: 


%-Х/>доздуа-ха-у(/ MS, / n , 


where MS,, is the error term for the one way ANOVA, and n is the common group 
size. The modification of the Tukey for the one sample repeated measures design is 


lxi = xj| > 4.05:k:(n—1)(k—1) MS yes /n (1) 


where (n—1)(k—1) is the error degrees of freedom and MS,es is the error term, re- 
placing Мү. 
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TABLE 5.4 
Type | Error Rates for the Tukey and Bonferroni Procedures under 
Different Violations of the Sphericity Assumption 


k=3 k=4 kaj 
Tukey Bonf Tukey Bonf Tukey Bonf 
n £ min £ = .50 min €= .33 min £- .25 
15 1.00 .041 .039 .045 .043 .050 .040 
15 .86 .043 .036 
15 74 051 033 
15 .54 .073 033 
45: 53 081 030 
15 49 087 036 
15  .831 061 044 
15 752 067 042 
15 22 081 038 
8 .860 048 042 
740 054 038 
8 2540 078 036 
8 2530 084 042 
8 490 095 032 
8.831 058 044 
8 752 060 042 
8 .522 ‚076 044 


The means, from Table 5.2 are 26.4, 25.6, and 32. If we set overall о = .05, then 
the appropriate studentized range value is q.05 ; 3,8 = 4.041. The error term, from Ta- 
ble 5.3, is 4.8 and the number of subjects is n = 5. Therefore, two treatments will be 
declared significantly different if 


|x; —X)|> 4.041/4.8/5 = 3.96 


Thus, treatment 3 differs from treatments | and 2, but treatments 1 and 2 are not 
significantly different (as one would have suspected). 

There are several other pairwise procedures that Maxwell (1980) discusses in a 
Monte Carlo study that compared the procedures control on overall а, when the 
sphericity assumption is violated. We present his results for the Tukey and 
Bonferroni approaches in Table 5.4. The Bonferroni approach in the repeated mea- 
sures context involves the use of multiple dependent t tests. For example, if there 
are five treatments, then there will be ten paired comparisons. If we wish overall œ 
= .05, then we simply do each dependent t test at the .05/10 = .005 level of signifi- 
cance. Results from Table 5.4 show that the Bonferroni approach keeps the actual 
а, < level of significance in all cases, even when there is a severe violation of the 
sphericity assumption (e.g., for k = 3 the min € = .50 and one of the conditions 
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modeled had € = .54). Because of this, Maxwell recommended the Bonferroni ap- 
proach for post hoc pairwise comparisons in repeated measures analysis if the 
sphericity assumption is violated. Maxwell also studied the power of the five ap- 
proaches, and found the Tukey to be most powerful. When € > .70 in Table 5.4, the 
deviation of actual о; from the level of significance is less than .02 for the Tukey 
procedure. This, coupled with the fact that the Tukey tends to be most powerful, 
would lead us to prefer the Tukey when е > .070. When е « .70, however, then we 
agree with Maxwell that the Bonferroni approach should be used. 


5.10 ONE BETWEEN AND ONE WITHIN 
FACTOR—A TREND ANALYSIS 


We now consider a slightly more complex design, adding a grouping (between) 
variable. An investigator interested in verbal learning randomly assigns 12 sub- 
jects to two treatments. She obtains recall scores on verbal material after 1, 2, 3, 4, 
and 5 days. Treatments is the grouping variable. She expects there to be a signifi- 
cant effect over time, but wishes a more focused assessment. She wants to mathe- 
matically model the form of the decline in verbal recall. For this, trend analysis is 
appropriate and in particular orthogonal (uncorrelated) polynomials are in order. If 
the decline in recall is essentially constant over the days, then a significant linear 
(straight line) trend, or first degree polynomial, will be found. On the other hand, if 
the decline in recall is slow over the first two days and then drops sharply over the 
remaining 3 days, a quadratic trend (part of a parabola), or second degree polyno- 
mial, will be found. Finally, if the decline is slow at first, then drops off sharply for 
the next few days and finally levels off, we will find a cubic trend, or third degree 
polynomial. We illustrate each of these cases below: 


Linear Quadratic Cubic 


Verbal Recall 
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The fact that the polynomials are uncorrelated means that the linear, quadratic, 
cubic, and quartic components are partitioning distinct (different) parts of the vari- 
ation in the data. 

In Table 5.5 we present the SAS and SPSS control lines for running the trend 
analysis on this verbal recall data. In Chapter 2, in discussing planned compari- 
sons, we indicated that several types of contrasts are available in SPSS MANOVA 
(Helmert, special, polynomial, etc.), and we also illustrated the use of the Helmert 
and special contrasts; here the polynomial contrast option is used. Recall these are 
built into the program, so that all we need do is request them, which is what has 
been done in the CONTRAST subcommand. 

When several groups are involved, as in our verbal recall example, an addi- 
tional assumption is homogeneity of the covariance matrices on the repeated mea- 
sures for the groups. In our example the group sizes are equal, and in this case a vi- 
olation of the equal covariance matrices assumption is not serious. That is, the test 
statistic is robust (with respect to type I error) against a violation of this assumption 
(cf. Stevens, 1986, Chapter 6). However, if the group sizes are substantially un- 
equal, then a violation is serious, and we indicate in Table 5.5 what should be 
added to test the assumption. 

Table 5.6 gives the means and standard deviations for the two groups on the 5 
repeated measures. In Table 5.7 we present selected, annotated output from SPSS 
MANOVA for the trend analysis. Results from that table show that the groups do 
not differ significantly (F = .04, p < .837) and that there is not a significant group 
by days interaction (F = 1.2, p < .323). There is, however, a quite significant days 
main effect, and in particular, the LINEAR and CUBIC trends are significant at the 
.05 level (F = 239.14, p < .000 and F = 10.51, p < .006, respectively). The linear 
trend is by far the most pronounced, and a graph of the means for the data in Figure 
5.1 shows this, although a cubic curve (with a few bends) will fit the data slightly 
better. 

In concluding this example, the following from Myers (1979) is important: 


Trend or orthogonal polynomial analyses should never be routinely applied when- 
ever one or more independent variables are quantitative.... It is dangerous to identify 
statistical components freely with psychological processes. It is one thing to postu- 
late a cubic component of A, to test for it, and to find it significant, thus substantiating 
the theory. It is another matter to assign psychological meaning to a significant com- 
ponent that has not been postulated on a priori grounds. (p. 456) 


TABLE 5.5 
SAS Control Lines and SPSS Command Syntax File for One Between 
and One Within Repeated Measures Analysis 


SAS SPSS 
TITLE ‘1 BETW & 1 WITHIN, TITLE ‘ONE BETWEEN AND ONE WITHIN - 
DATA TREND; INTERM. BOOK P. 204’. 
INPUT GPID Yl Ү2 ҮЗ Ү4 Y5; DATA LIST FREE/GPID Yl Ү2 УЗ Y4 Y5. 
CARDS; BEGIN DATA. 
26 20 18 11 1 26 20 18 11 10 1 34 35 29 22 23 
34 35 29 22 23 1 41 37 25 18 15 
41 37 25 18 1 29 28 22 15 13 1 35 34 27 21 17 
29 28 22 15 1 28 22 17 14 10 
35 34 2721 1 38 34 28 25 22 1 43 37 30 27 25 
28 22 17 14 2 42 38 26 20 15 2 31 27 21 18 13 
38 34 28 25 22 2 45 40 33 25 18 
43 37 30 27 25 2 29 25 17 13 8 2 29 32 28 22 18 
2 42 38 26 20 2 33 30 24 18 7 
2 31 27 21 18 2 34 30 25 24 23 2 37 31 25 22 20 
2 45 40 33 25 18 END DATA. 
2 29 25 17 13 8 LIST. 
2 29 32 28 22 18 MANOVA Yl TO Y5 BY GPID (1,2)/ 
2 33 30 24 18 7 @ WSFACTOR = DAY(5)/ 
2 34 30 25 24 23 (6) CONTRAST(DAY) = POLYNOMIAL/ 
2 37 31 25 22 20 @ WSDESIGN = DAY/ 
PROC GLM; © RENAME = MEAN, LINEAR, QUAD, CUBIC, QUART/ 
CLASS GPID; PRINT = TRANSFORM CELLINFO (MEANS) 
MODEL Yl Ү2 ҮЗ Ү4 Ү5 = GPID; SIGNIF (AVERF) / 
@ REPEATED DAY 5 (12 3 4 5) ANALYSIS (REPEATED) / 
Q POLYNOMIAL/ SUMMARY; © DESIGN = GPID/. 


(D The REPEATED statement is fundamental for running repeated measures designs on SAS. The 
general form is REPEATED factorname levels (level values) transformation/options; Note that the 
level values are in parentheses. We are interested in polynomial contrasts on the repeated measures, and 
so that is what has been requested. Other transformations are available HELMERT, PROFILE, etc.— 
see SAS USER's GUIDE: STATISTICS, Version 5, p. 454). 

®© SUMMARY here produces ANOVA tables for each contrast defined by the within subject actors. 

® Recall again that the WSFACTOR (within subject factor) and the WSDESIGN (within subject 
design) subcommands are fundamental for running multivariate repeated measures analysis on SPSS. 

® If we wish trend analysis on the DAY repeated measure variable, then all we need do is request 
POLYNOMIAL on the CONTRAST subcommand. 

© In this RENAME subcommand we are giving meaningful names to the polynomial contrasts be- 
ing generated. 

© It is important to realize that with SPSS MANOVA there is a design subcommand WSDESIGN) 
for the within or repeated measures factor(s) and a separate DESIGN subcommand or the be- 
tween(grouping) factor(s). 
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TABLE 5.6 
Means and Standard Deviations for One Between and One Within 
Repeated Measures 


Placeholder for T0506 on p. 219 of previous edition 


5.11 POST HOC PROCEDURES FOR THE ONE 
BETWEEN AND ONE WITHIN DESIGN 


In the one between and one within, or mixed model, repeated measures design, we 
have both the assumption of sphericity and homogeneity of the covariance matri- 
ces for the different levels of the between factor. This combination of assumptions 
has been called multisample sphericity. Keselman and Keselman (1988) con- 
ducted a Monte Carlo study examining how well four post hoc procedures con- 
trolled overall alpha under various violations of multisample sphericity. The four 


TABLE 5.7 
Selected Output from SPSS for One Between and One Within 


Insert T0507 from p. 220 of previous edition 


(D The group and group by days interaction are not significant, although the unadjusted DAYS main 


effect is significant at the .05 level. 
The last four columns of numbers are the coefficients for orthogonal polynomials, although they 
may look strange since each column is scaled such that the sum of the squared coefficients equals 1. 


Textbooks typically present the coefficients for 5 levels as follows: 


Linear -2 -1 0 1 2 
Quadratic 2 –1 -2 -1 2 
Cubic -1 2 0 -2 1 
Quartic 1 4 6 -4 1 


Compare, for example, Fundamentals of Experimental Design, Myers, 1979, р. 548. 
Э This value of ё indicates a severe violation of the sphericity assumption, although the adjusted 


univariate test is still easily significant at the .05 level. 
(Continued) 
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procedures were: the Tukey, a modified Tukey employing a nonpooled estimate of 
error, a Bonferroni f statistic, and a 1 statistic with a multivariate critical value. 
These procedures were also used in the Maxwell (1980) study of post hoc proce- 
dures for the single group repeated measures design. 

Keselman and Keselman set the number of groups at 3 and considered 4 and 8 
levels for the within (repeated) factor. They considered both equal and unequal 
group sizes for the between factor. Recall that £ quantifies departure from spheric- 
ity, and £ = 1 means sphericity, with 1⁄k—1) indicating maximum departure from 
sphericity. They investigatede = .75 (a relatively mild departure) and ғ = .40 (а se- 
vere departure for the 4 level case, given the minimum value there would be .33). 
Selected results from their study are presented below for the four level within fac- 
tor case. 


Tukey(pooled) Bonferroni Multivariate 


oeanal ee 6.34 3.46 1,70 
matrices & gp sizes 
unequal covariance 
matrices, but equal 7.22 4.32 2.48 
e=.75 group sizes 


unequal covariance 

matrices and gp 

sizes—larger 14.78 11.38 7.04 
variability with smaller 

group size 


equal covariance 11.36 2.38 1.16 
matrices & gp sizes : E: ы 
unequal covariance 
matrices, but equal 10.08 2.70 1.56 
sadn group sizes 

unequal covariance 

matrices and gp 

sizes—larger 17.80 6.34 3.94 

variability with smaller 

group size 


The group sizes for the values presented above were 13, 10, and 7. The entries in 
the body of the table are to be compared against an overall alpha of .05. 
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Verbal Recall 


Days 


FIGURE 5.1 Linear and Cubic Plots for Verbal Recall Data 


The above results show that the Bonferroni approach keeps the overall alpha 
less than .05, provided you do not have both unequal group sizes and unequal 
covariance matrices. If you want to be confident that you will be rejecting falsely 
no more than your level of significance, then this is the procedure of choice. In my 
opinion, the Tukey procedure is acceptable for € = .75, as long as there are equal 
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group sizes. For the other cases, the error rates for the Tukey are at least double the 
level of significance, and therefore not acceptable. 

Recall that the pooled Tukey procedure for the single group repeated measures 
design was to reject if 


lxi - «5| > q05;k:(n-(k-1) V MSres /n 


where n is the number of subjects, k is the number of levels, and /М5,; is the error 
term (Equation 1). 

For the one between and one within design with J groups and k within levels, we 
declare two marginal means (means for the repeated measures levels over the J 
groups) different if 


lxi — Х| > дому МЗ! / М 


where the mean square is the within subjects error term for the mixed model and М 
is total number of subjects. 


5.12 ONE BETWEEN AND TWO WITHIN FACTORS 


We now consider the repeated measures analysis for a one between and two within 
design, using data from Elashoff (1981). Two groups of subjects are given three 
different doses of two drugs. There are several different questions of interest in this 
study. Will the drugs be differentially effective for different groups? Is the effec- 
tiveness of the drugs dependent on dose level? Is the effectiveness of the drugs de- 
pendent on both dose level and on the group? 

The design for this study is given below schematically: 


Drug 1 Drug 2 
Dose D; D» D3 Di D2 D3 


53 Yi Y? Үз Y4 Ys Yo 
Group 1 52 


Group 2 510 


Note that there are six measures for each subject (Y; to Уб). Also, we have а 
crossed design on the within variables of drug and dose. 

The complete control lines for running this analysis on SAS are given in Table 
5.8. The means and standard deviations for the six repeated measures are given in 
Table 5.9. Table 5.10 presents the univariate analyses for this design from the SAS 
GLM program. Although the control lines for SAS in Table 5.8 yield both the 
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TABLE 5.8 
SAS Control Lines for One Between and Two Within Repeated Measures 


TITLE ‘ELASHOFF DATA’ ; 
DATA ELAS; 
INPUT GP Yl Ү2 УЗ Y4 Y5 Y6; 


1 19 22 28 16 26 22 

1 11 19 30 12 28 28 

1 20 24 24 24 22 29 

1.2271. 25 25 15: 10. 26 

1 18 24 29 19 26 28 

1 17 23 28 15 23 22 

1 20 23 23 26 21 28 

14 20 29 25 29 29 

16 20 24 30 34 36 

26 26 24 30 32 

23 33 36 45 

18 29 27 26 34 

21. 20 22 22 21 

25: 29° 29 33 

22 23 27 26 35 

20 22 23 26 28 

ROC GLM; 

LASS GP; 

DEL Yl Ү2 ҮЗ Y4 Y5 Y6 = GP; 

PEATED DRUG 2, DOSE 3; 


N N 
N OA 
N 
- 


њи омы + 
-HnP| о мю о 
№ 
сл 


О 


пе Су д Ко бо бо бо КУ КУ КУ КУ | 


Ф 
© 


© Recall that in the MODEL statement the dependent variables, the repeated measures here, go on 
the left side and the classification or grouping variable(s) go on the right side. 

(2) When there is more than one repeated measures factor, they must be separated by a comma, and 
the product of the levels of all factors must equal the number of dependent variables in the MODEL 
statement. 


univariate and multivariate analyses, we have just presented the univariate analy- 
ses because the Greenhouse-Geisser epsilons in Table 5.10 are greater than .70. 
For such a relatively mild violation of sphericity, it has been shown that the type I 
error rate remains at essentially the level of significance. 

Results from Table 5.10 show that we have significant drug, group, and dose 
main effects and a significant drug by group interaction at the .05 level. To ascer- 
tain what was responsible for the drug, group and drug by group interactions we 
take the means from Table 5.9 and insert them into the design, yielding: 


TABLE 5.9 
Means and Standard Deviations for One Between 
and Two Within Repeated Measures 


Placeholder for 70509 from p. 227 of previous edition 


205 


чотро soniA2Jd Jo 672-922 ‘dd шоу q 2 в sed 0160, 10] Topjouooe[d 


шцим ому pue игемјед euo JO} ITD SYS шош; sesAjeuy әуемелімгі 
ors чау 


206 


'pexoq 
әле sullo] IOIO ou], 'su8rsop somnsvour ројеодол хојашоо цим uoneorpduroo [euonrppe ue *usrsop SIY) ur POATOAUT әле sullo] IOIO juo19gjrp p Jeu) NON © 
пелој SQ’ әш 1? зивота ose st 1оодо шеш SOP әш AM “JAA CQ’ eu W 10202110815 әле поповлојш dnoi3 Ад Snip pue Pop шеш SNIP AL % 9 © 
SO’ > ©8107 әәш “әләт $): ou Ie Apueorrusirs 1ogrp sdnour) (2) 

"цовола 
-de әўеттелцүпш oy) uey} тпјломод олош oq Аеш jr pue ројолпоо 51 WI 10119 Т әй) әш oours “ролојола st цоволаде әуеттелтип og] ‘0L < әле 53 од оошб (7) 


"поттро snotaaid Jo 677 'd 11e 10] 1oppouooe[q 


207 


208 CHAPTERS 


Drug 
1 2 
Dose 1 2 3 1 2 3 
Group 1 17:5 22.5 27 19 21.88 26.5 
Group 2 19.63 22.38 24 26.88 28.63 33.00 


Now, collapsing on dose, the group by drug means are obtained: 


Drug 
1 2 
Group 1 22.3 22.46 
Group 2 22.00 29.50 


The mean in cell 11 (22.33) is simply the average of 17.5, 22.5, and 27, the 
mean in cell 12 (22.46) is the average of 19, 21.88, and 26.5, etc. Now it is apparent 
that the “outlier” cell mean of 29.50 is what was responsible for both main effects 
and the interaction being significant. Note that if this cell mean were about 22 or 
23, as the others, then none of the effects would have been significant. 

Now, we obtain the level means for DOSE and then apply the Tukey procedure 
to see which dose levels differ significantly. The dose level means are 20.753, 
23.848, and 27.625. Now, two DOSE level means will differ significantly if 


% — xj| > 405:3,28 410.391/16 , 


where 10.391 is the mean square error term for DOSE (cf. Table 5.9), 16 is the 
number of subjects for each dose, and 28 is the error degrees of freedom. Calcula- 
tion yields: 


ix; — x;| > 3.4864/10.391/16 = 2.809 


Since the smallest difference between any two level means is 3.095 for levels 1 
and 2, this means that all dose levels differ significantly from one another. 


One Between and Two Within on SPSS for Windows 12.0 


Once the data are in the editor, click on ANALYZE and scroll down to 
GENERAL LINEAR MODEL. At this point the screen appears as at the top of 
Table 5.11. When you scroll across to GLM-REPEATED MEASURES and click 
the screen at the left middle of Table 5.11 appears. Click within the WITHIN 
SUBJECT FACTOR NAME box and type in drug. Then click within the 
NUMBER OF LEVELS box and type in 2. The ADD box will light up; click on 
it. Do the same for DOSE (remember DOSE has 3 levels), click on ADD, and at 
this point the screen will appear as at the right middle in Table 5.11. When you 
click on DEFINE, the screen at the bottom of Table 5.11 appears. Click on y 1 
and then click on the forward arrow to put yl in position 1,1. Do the same for y2 
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through y6. Finally, click on GP and click on the forward arrow to make GP a 
between subjects factor. The OK box will light up. Simply click on OK to run 
the analysis. 


5.13 TOTALLY WITHIN DESIGNS 


There are research situations where the same subjects are measured under various 
treatment combinations, that is, where the same subjects are in each cell of the de- 
sign. This may particularly be the case when not many subjects are available. We 
consider three examples to illustrate: 


Example 1 


A researcher in child development is interested in observing the same group of pre- 
school children (all 4 years of age) in two situations at two different times (morn- 
ing and afternoon) of the day. She is concerned with the extent of their social inter- 
action, and will measure this by having two observers independently rate the 
amount of social interaction. The average of the two ratings will serve as the de- 
pendent variable. The within factors here are situation and time of day. 

There are 4 scores for each child: social interaction in situation 1 in the morning 
and afternoon, and social interaction in situation in the morning and afternoon. We 
denote the four scores by Y1, Y2, Y3, and Y4. 

Such a totally within repeated measures design is easily setup on SPSS 
MANOVA. The command syntax file is given below: 


TITLE 'TWO WITHIN DESIGN'. 
DATA LIST ЕКЕЕ/Ү1 Y2 УЗ Y4. 
BEGIN DATA. 

DATA LINES 

END DATA. 
MANOVA Yl TO Y4/ 

WSFACTOR - SIT(2),TIME(2)/ 
WSDESIGN/ 
PRINT = TRANSFORM CELLINFO (MEANS) / 
ANALYSIS (REPEATED) /. 


Example 2 


A social psychologist is interested in determining how self-reported anxiety level 
for 35-45 year old men varies as a function of situation, who they are with, and 
how many people are involved. A questionnaire will be administered to 20 such 
men, asking them to rate their anxiety level (on a Likert scale from 1 to 7) in 3 situ- 


TABLE 5.11 
SPSS for Windows 12.0 Screens for for One Between and Two Within 
Repeated Measures 


х 


Repeated Measures Define Factor{s} хі Repeated Measures Define Ғасіюііз) 


Within-Subject Factor Name: [factori Within-Subject Factor Name: [dus 
Number of Levels: Г == | Number of Levels: В. Bes | 


Add | Сапсе! | Add | Cancel | 
Change | Help | Change | Help | 
Remove | Measure >> Remove Measure >> | 


1 2 
Repeated Measures Define Factors} | xj Repeated Measures Define Factor{s} хі 


Within Subject Factor Name: [dose Within-Subject Factor Name: || 
Number of Levels: В. Reset | Number of Levels: Г Reset | 


drug(2] Cancel | 
Change | Нер | 
Remove] Measure >> 


о 


jetween-Subjects Factor{s}: 


Model... | Contests... Plots... | Post Нос. | Save... | Options... | 
5 
Use ANALYZE—GENERAL LINEAR MODEL—REPEATED MEASURES to get to screen 1. 


Click on ADD to go from screen 3 to screen 4. Click on DEFINE to go from screen 4 to screen 5. Click 
on OK in screen 6 to run the analysis. 
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TABLE 5.11 
SPSS for Windows 12.0 Screens for for One Between and Two Within 
Repeated Measures (continued) 


: Repeated Measures Е хі 


Within-Subjects Variables = (drug,dose} Ok | 


Between-Subjects Factor(s]: 


гт 
Covariates: 


[>] 
Model... Contrasts... | Plots... Post Hoc... | Save... | Options... 


ations (going to the theater, going to a football game, and going to a dinner party), 
with primarily friends and primarily strangers, and with a total of 6 people and 
with 12 people. Thus, the men will be reporting anxiety for 12 different contexts. 
This is a three within, crossed repeated measures design, where situation (3 levels) 
is crossed with nature of group (2 levels) and with number in group (2 levels). 


Example 3 


Suppose in an ergonomic study we are interested in the effects of day of the work 
week and time of the day (AM or PM) on various measures of posture. We select 
30 computer operators and for this example we consider just one measure of pos- 
ture called shoulder flexion. We then have a two factor totally within design which 
looks as follows: 


Monday Wednesday Friday 
AM PM AM PM AM PM 


о до = 


30 
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TABLE 5.12 
SPSS 12.0 Command Syntax File for Helmert Contrasts 
for a Repeated Measures Factor 


TITLE ‘HELMERT CONTRASTS FOR REPEATED MEASURES’ . 


DATA LIST FREE/Y1 Y2 Y3 Y4. 
BEGIN DATA. 
6.6. 1.3 2:5 2.1 3.0 1.4 3.8 4.4 4.7 4.5 5.8 4.7 
6.2 6.1 6.1 6.7 3.2 6.6 7.6 8.3 2.5 6,2 8.0 8.2 
2.8 3.6 4.4 4.3 Lied, Anh Set 528 2.9 4.9 6.3 6.4 
5.5 4.3 5.6 4.8 
END DATA 
LIST. 
(0 MANOVA Yl TO Y4/ 
WSFACTOR DRUGS (4) / 
© CONTRAST (DRUGS) = HELMERT/ 
WSDESIGN DRUGS / 


PRINT = CELLINFO (MEANS) TRANSFORM/ 
ANALYSIS (REPEATED) /. 


( 
Ф RENAME = MEAN, HELMERT1, HELMERT2, HELMERT3 / 
( 


Ф Recall that the four repeated measures are treated as 4 dependent variables. 

Q Since HELMERT is one of the standard set of contrasts available in SPSS MANOVA, all we need 
do is request it in the CONTRAST subcommand. 

© We have simply given meaningful names to the transformed variables. The first transformed vari- 
able is a general mean, and the last 3 transformed variables are the Helmert contrasts that we wish to test 
the significance of. 


5.14 PLANNED COMPARISONS IN REPEATED 
MEASURES DESIGNS 


Planned comparisons can be easily set up on SPSS MANOVA or on SAS GLM for 
repeated measures factors. To illustrate, we consider data from a study reported by 
Bock (1975). The study involved the effect of three drugs on the duration of sleep 
for 10 mental patients. The drugs were given orally on alternate evenings, and the 
hours of sleep were compared with an intervening control night. Each of the drugs 
was tested a number of times with each patient. The average number of hours of 
sleep was the dependent measure. Schematically, we have 


Control Drug Type I Drug Type II Drug Type III 


Subjects 1 
2 


10 
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Drug Type I was distinctly different from the remaining two drugs, which were 
somewhat similar in composition. Three relevant questions here are: 


1. Does drug have a different effect on duration of sleep than no drug? 

2. Does drug Type I produce a different effect from types II and III? 

3. Do drug types II and III, which are similar, have a differential effect on 
sleep? 


These questions correspond to the following contrasts on the repeated mea- 
sures: 


MI У2 Уз У4 
P 1 -.33 -.33 -.33 
In 0 1 -.50 -.50 
Ід 0 0 1 –1 


Notice in the above that each level of the repeated measure is contrasted against 
the average of the remaining levels. This kind of set of contrasts are called Helmert 
contrasts. They are built into the SPSS and SAS packages. All one need do is re- 
quest them. In Table 5.12 we present the complete command syntax for running 
the Helmert contrasts on SPSS MANOVA. 


5.15 SUMMARY 


1. Repeated measures designs are more powerful than completely random- 
ized designs, since the variability due to individual differences is removed 
from the error term, and individual differences are the major reason for er- 
ror variance. 

2. Two major advantages of repeated measures designs are increased preci- 
sion (because of the smaller error term), and economy of subjects. Two po- 
tential disadvantages are that the order of treatments may make a differ- 
ence (this can be dealt with by counterbalancing) and carryover effects. 

3. Either a univariate or multivariate approach can be used for repeated mea- 
sures analysis. If the sphericity assumption is tenable, then the univariate 
approach is preferred as it is more powerful. 

4. If sphericity is violated, then the type I error rate for the univariate ap- 
proach is inflated. However, a modified univariate approach (obtained by 
multiplying each of the degrees of freedom by €) yields an honest type I er- 
ror rate. 

5. As both the modified univariate and multivariate approaches control type I 
error, the choice between them involves the issue of power. To keep things 
simpler in this text, I simply illustrate and use the modified univariate ap- 
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proach. However, as I point out in my multivariate text, neither approach is 
even usually more powerful, and therefore I recommend there that both ap- 
proaches should be used, since they may differ in the effects they will dis- 
сетп. 


. If sphericity is tenable, then the Tukey is a good post hoc procedure for lo- 


cating pairwise differences. If sphericity is not tenable, then the Bonferroni 
approach should be used. That is, do multiple correlated 1 tests, but use the 
Bonferroni Inequality to keep the overall о; level under control. 


. When several groups are involved, then an additional assumption is homo- 


geneity of the covariance matrices for the groups. This can be checked with 
the Box test, and would be of most concern when the group sizes are 
sharply unequal. 


EXERCISES 


. Consider the following data for a single group repeated measures with 8 


subjects measured for 4 treatments: 


Treatments 

Subjects 1 2 3 4 
1 5 6 2 5 
2 3 4 1 6 
3 3 7 4 10 
4 6 8 3 3 
5 4 9 7 8 
6 5 7 4 9 
7 2 10 1 2 
8 4 3 2 5 


(a) Do a univariate repeated measures analysis on this data, testing for sig- 
nificance at the .05 level. 

(b) Use the Tukey post hoc procedure to locate the significant pairwise dif- 
ferences at the .05 level. 

(c) Run the above data on SPSS MANOVA to check your results. 


. Give an example or two where a two or three within subjects design would 


make sense. As a starter for you, consider driving behavior (measured by 
number of steering errors) or smoking behavior (number of cigarettes 
smoked) for a group of subjects, and a few key factors that you think might 
influence such behavior. 
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3. Output from SPSS MANOVA for the single sample (5 subjects and 3 lev- 
els) repeated measures design in Table 5.1 includes the following: 


GREENHOUSE-GEISSER EPSILON = .66564 
HUYNH-FELDT EPSILON = .87240 
LOWER-BOUND EPSILON = .50000 


The covariance matrix for the three measures 15 


76.8 53.2 69.0 
S=/53.2 42.8 47.0 
69.0 47.0 64.0 


The formula for the Greenhouse—Geisser epsilon is: 


k? (Si — 5 )2 


(k-)0 У 59 -28 js? +252) 


ё = 


where 

5 is the mean of all entries in the covariance matrix S 
Sii is mean of entries on main diagonal of S 

s; is mean of all entries in row i of S 

sj is ijth entry of S 


(a) Using this formula, verify the SPSS value of .66564. 
(b) Using the equation given in the chapter relating the 
Greenhouse-Geisser and the Huynh—Feldt epsilons, verify the value of 
.87240. 
(c) Why is the LOWER-BOUND EPSILON value given as .500? 
(d) For the one between and one within design in Table 5.7, the 
Greenhouse-Geisser epsilon = .44629 and Huynh-Feldt epsilon = .54366 
on the SPSS printout. A generalized formula relating these measures for 
this design is given by 

" ng(k —1)$ 2 

(k-D[g(n — 1) - (k — D£] 

where 2 is the number of groups, п is the number of subjects per group, and 
k is the number of levels for the within variable. Using this relationship, 


show how the Huynh-Feldt value of .54366 follows from the 
Greenhouse-Geisser value of .44629. 


4. Consider the following hypothetical data from a study comparing the rela- 
tive efficacy of a behavior modification approach to dieting vs. a behavior 
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modification approach + exercise on weight loss for a group of overweight 
women. There is also a control group. First, 18 women who are between 20 
and 30 years old are randomly assigned to one of the three groups. Then, 
six each of women 30 to 40 years old are randomly assigned to one of the 
three groups. The investigator wishes to determine whether age might 
moderate the effect of the diet approaches. The weight loss for the women 
is measured two months, four months, and six months after the diets begin. 
Thus, we have a two between and one within repeated measures design. 


DIET AGE WGTLOSSI WGTLOSS2 WGTLOSS3 
1.00 1.00 4.00 3.00 3.00 
1.00 1.00 4.00 4.00 3.00 
1.00 1.00 4.00 3.00 1.00 
CONTROL 1.00 1.00 3.00 2.00 1.00 
20-30 YRS 1.00 1.00 5.00 3.00 2.00 
1.00 1.00 6.00 5.00 4.00 
1.00 2.00 6.00 5.00 4.00 
1.00 2.00 5.00 4.00 1.00 
CONTROL 1.00 2.00 3.00 3.00 2.00 
30-40 YRS 1.00 2.00 5.00 4.00 1.00 
1.00 2.00 4.00 2.00 2.00 
1.00 2.00 5.00 2.00 1.00 
2.00 1.00 6.00 3.00 2.00 
2.00 1.00 5.00 4.00 1.00 
BEH. MOD 2.00 1.00 7.00 6.00 3.00 
20-30 YRS 2.00 1.00 6.00 4.00 2.00 
2.00 1.00 3.00 2.00 1.00 
2.00 1.00 5.00 5.00 4.00 
2.00 2.00 4.00 3.00 1.00 
BEH. MOD 2.00 2.00 4.00 2.00 1.00 
30-40 YRS 2.00 2.00 6.00 5.00 3.00 
2.00 2.00 7.00 6.00 4.00 
2.00 2.00 4.00 3.00 2.00 
2.00 2.00 7.00 4.00 3.00 
3.00 1.00 8.00 4.00 2.00 
BEH. MOD. + EXER. 3.00 1.00 3.00 6.00 3.00 
20-30 YRS 3.00 1.00 7.00 7.00 4.00 
3.00 1.00 4.00 7.00 1.00 
3.00 1.00 9.00 7.00 3.00 
3.00 1.00 2.00 4.00 1.00 
3.00 2.00 3.00 5.00 1.00 
BEH. MOD. + EXER. 3.00 2.00 6.00 5.00 2.00 
30-40 YRS 3.00 2.00 6.00 6.00 3.00 
3.00 2.00 9.00 5.00 2.00 
3.00 2.00 7.00 9.00 4.00 


3.00 2.00 8.00 6.00 1.00 
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(a) Run the analysis on SPSS MANOVA, obtaining both the multivariate 
and univariate results. 

(b) Which of the between effects are significant at the .05 level? 

(c) Given the values of the Greenhouse-Geisser and Huynh-Feldt 
epsilons, would the univariate or multivariate approach be preferred? 

(d) Which of the within effects are significant at the .05 level? 

(e) Using the appropriate means (cell, row or column), interpret the re- 
sults. 


. Run the Helmert planned comparisons given in Table 5.12 on SPSS 
MANOVA. If overall alpha is set at .10, then which are significant? What 
do the significant contrasts represent? 


. Recall that in the Elashoff data example in section 5.12 two groups of sub- 
jects were given three different doses of two drugs, which yielded a one be- 
tween and two within repeated measures design. Suppose that the two 
groups of subjects had been given the different doses of the drugs under 
two different conditions. Then we would have a one between and three 
within design. Show the SPSS MANOVA control lines for running this 
analysis. 


. Consider the following data. The dependent variable is Beck depression 
score: 


WINTER SPRING SUMMER FALL 


1 7.50 11.55 1.00 1:21 
2 7.00 9.00 5.00 15.00 
3 1.00 1.00 00 00 
4 00 00 00 00 
5 1.06 00 1.10 4.00 
6 1.00 2.50 00 2.00 
7 2.50 00 00 2.00 
8 4.50 1.06 2.00 2.00 
9 5.00 2.00 3.00 5.00 
10 2.00 3.00 4.21 3.00 
11 7.00 7.35 5.88 9.00 
12 2.50 2.00 .01 2.00 
13 11.00 16.00 13.00 13.00 
14 8.00 10.50 1.00 11.00 


(a) Run this on SPSS or SAS as a single group repeated measures. Is it sig- 
nificant at the .05 level, assuming sphericity? 
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8. А researcher is interested in the smoking behavior of a group of 30 men, 10 
of which аге 30-40, 10 are 41-50, and the remaining 10 are 51-60. She 
wishes to determine if how much they smoke is influenced by the time of 
the day (morning or afternoon) and by context (at home or in the office). 
The men are observed in each of the 4 situations and the number of ciga- 
rettes is recorded. She also wishes to determine whether the age of the men 
influences their smoking behavior. 

(a) What type of a repeated measures design is this? 
(b) Show the complete SPSS MANOVA control lines (put DATA for the 
data lines) for running the analysis. 


9. Consider the following data set: 


TREATMENTS 
1 2 3 
5 6 1 
3 4 2 
3 7 1 
6 8 3 
6 9 3 
4 7 2 
5 9 2 


(a) Do a single group repeated measures analysis on this data. 


Simple and Multiple Regression 
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Appendix Тһе PRESS Statistic 


61 SIMPLE REGRESSION 


Here we are predicting a dependent (outcome) variable from a single predictor. 
Several examples come to mind. One may wish to predict chemistry achievement 
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from I.Q. One may wish to predict a person’s heart rate from blood pressure. A 
farmer may wish to predict yield from level of the fertilizer. 

Before we get into simple regression, let us review some basic concepts from 
high school. Recall that in high school you may have been told to graph an equa- 
tion such as у = 1 + .5х. То do so you take a range of x values, determine the corre- 
sponding y values, and then plot the points. It would look like this: 


x 
-2 о 
-1 
0 o 
1 o 
2 
o 
т ғ” 
1.00 0.00 1.00 2.00 


ж 


When you did this each y value was exactly determined once the x value was 
specified. You probably did not think a lot about that fact. This is called a determin- 
istic model. When we collect data and are attempting to predict some y (like col- 
lege GPA) from a single x (like high school GPA), it should be obvious that perfect 
prediction is not going to happen. Why? Because there are many other factors that 
determine college GPA, like what your major is, where you go to college, boy- 
friends and girlfriends, a death in the family, a divorce, attitude toward school, etc. 
Because of all these other factors we need to set up what is called a probabilistic 
model, where we allow for error in prediction. We will assume a linear relationship 
between y and x. The probabilistic model is as follows: 


yi = Во + Bixi +e; і-1,2,..,П 


The part I have underlined corresponds to the linear relationship, and the other 
part is the error of prediction. 
To illustrate the above, consider the following data set and scatterplot: 
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From the above plot it is obvious that a straight line will not fit the points per- 
fectly. Yet it is also clear that as x increases y increases and that the relationship 
seems primarily linear. We wish to model this linear relationship, and we will see 
that a “least squares” regression line does a pretty good job. 

We consider two examples to illustrate simple regression. The first example 
uses artificial data for a small data set. The second example, based on real data, 
provides a more realistic use of simple regression in practice. 

Before we get into the examples, let us consider more precisely what is done in 
simple regression. First, we are assuming a linear relationship exists between the 
dependent variable and the predictor. This means there is a significant correlation 
between x and y. We are modeling y, assuming it is linearly related to x (predictor). 
The mathematical model looks like this: 


yi = Во +Bixi +e; і-1,2,..,П 


where Во and В; are to be estimated, and the e; are the errors of prediction. There 
are assumptions concerning the e; which we get to later. How do we estimate the 
Bs? The least squares criterion is used; that is, the sum of the squared estimated er- 
rors of prediction is minimized. 


n 
ê? +ê? +ê = У 282 = min 
гі 
Now, ё; = у; – y; where y; is the actual score on the dependent variable and y; is 
the estimated score for the ith subject. 


The score for each subject defines a point in the plane. What the least squares 
criterion does is find the line that best fits the points. Geometrically this corre- 
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Least squares minimizes the sum of 
these squared vertical distances, i.e., it 
finds the tine which best fits the points. 


X 


FIGURE 6.1 Geometrical Representation of Least Squares Criterion 


sponds to minimizing the sum of the squared vertical distances (62) of each sub- 
ject's score from their estimated score. This is illustrated in Figure 6.1. 


Example 1 


The above is abstract. To give the reader a feel for the errors of prediction and just 
plotting of points, we consider our first example with just 11 data points. In Table 
6.1 we present selected printout from the SPSS for Windows 12.0 regression run of 
this data. First, the correlation of .841 shows that there is a strong linear relation- 
ship. Second, from the unstandardized coefficients we can construct the prediction 
equation. We have put that equation on the diagram below. Third, in Table 6.1 we 
have the unstandardized predicted values and errors. In Figure 6.2 we present the 
regression line, along with geometric illustrations of what some of the estimated y 
values look like and how the errors of prediction are obtained by simply taking the 
difference between the person's actual y score and the person's predicted score. 


Example 2 


For our second example, using real data, we consider part of a Sesame Street data- 
base from Glasnapp and Poggio (1985), who present data on many variables, in- 
cluding 12 background variables and 8 achievement variables, for 240 subjects. 
Sesame Street was developed as a television series aimed mainly at teaching pre- 
school skills to three to five year old children. Data was collected on many achieve- 
ment variables both before (pretest) and after (posttest) viewing of the series. We 


TABLE 6.1 
Selected Simple Regression Output From SPSS for Windows 12.0 


Placeholder for T0601 on p. 243 of previous edition. 
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TABLE 6.2 
SPSS Command Syntax for Simple Regression on Sesame Street Data 
and Selected Printout 


TITLE ‘SIMPLE REGRESSION ON SESAME DATA’. 
DATA LIST FREE/PREBODY POSTBODY. 

BEGIN DATA. 
DATA LINES 
END DATA. 
REGRESSION DESCRIPTIVES = DEFAULT/ 


VARIABLES = PREBODY POSTBODY/ 
DEPENDENT = POSTBODY/ 

© METHOD = ENTER/ 

©  SCATTERPLOT (POSTBODY, PREBODY) /. 


(D DESCRIPTIVES = DEFAULT subcommand yields the means, standard deviations and the cor- 
relation matrix for the variables. 
© This SCATTERPLOT subcommand yields (һе scatterplot for the variables. Note that the vari- 
ables have been standardized (z scores) and then plotted. 
(continued) 


consider here only one of the achievement variables, knowledge of body parts. In 
particular, we consider pretest and posttest data on body parts for a sample of 80 
children. 


x y 
2.00 3.00 
3.00 6.00 
4.00 8.00 
6.00 4.00 
7.00 10.00 
8.00 14.00 
9.00 8.00 

10.00 12.00 
д,=-2.7 11.00 - 14.00 
12.00 12.00 
13.00 16.00 


FIGURE 6.2 Errors of Prediction 


TABLE 6.2 
(Continued) 


Placeholder for T0602b from p. 245 of previous edition 


Ф This legend means there is one observation whenever a single dot appears, two observations 
whenever a : appears, and 5 observations where there is an asterisk (*). 

© The multiple correlation here is in fact the simple correlation between postbody and prebody, 
since there is just one predictor. 

G These are the raw coefficients which define the prediction equation: POSTBODY = .50197 
PREBODY + 14.6888. 
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PLOT WHEN MODEL MODEL VIOLATION: 


E IS CORRECT . n NONLINEARITY 


MODEL VIOLATION: MODEL VIOLATION: 
NONCONSTANT NONLINEARITY AND 
n VARIANCE тү, NONCONSTANT VARIANCE 


FIGURE 6.3 Plots of Residuals vs. Predicted Values 


The command syntax for running the regression analysis, along with selected 
printout, is presented in Table 6.2. Part of the printout is a standardized scatterplot 
(recall that when variables are standardized this does not affect the magnitude of 
the correlation). 


6.2 ASSUMPTIONS FOR THE ERRORS 


The errors (e;) are assumed to be independent, with constant variance and normally 
distributed with a mean of 0. If these assumptions are valid for a given set of data, 
then the estimated errors (2; ), called the residuals, should behave similarly. There 
are various plots involving the residuals that are available for assessing potential 
problems with a linear regression model. One of the most useful plots, in my opin- 
ion, involves graphing the residuals against the predicted values. If the assump- 
tions of the regression model are tenable, then the residuals should scatter ran- 
domly about a horizontal line of 0. Any systematic pattern or clustering of the 
residuals suggests a model violation(s). In Figure 6.3 I present four plots: one in 
which the assumptions are tenable, while in the other 3 plots there is a model viola- 
tion(s). We obtained this plot for the first data example, and the results are pre- 
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Placeholder for 20604 from p. 247 of previous edition 


FIGURE 6.4 Plots of Residuals vs. Predicted Values for Example 1 Data 


sented in Figure 6.4. This plot indicates that the assumptions are tenable for this set 
of data. 


6.3 INFLUENTIAL DATA POINTS 


There is one additional point we wish to make before moving into multiple regres- 
sion. There are situations where a single point may have a big influence on the re- 
sulting prediction equation; such a point is called an influential point. A statistic 
that is quite useful for detecting such influential points is called Cook's distance. 
Cook and Weisberg (1982) indicate that if Cook's distance (which is readily ob- 
tained from SPSS or SAS) is > 1, then generally that point will be influential. As a 
vivid illustration of such a point, consider Case B from Chapter 1, Section 1.6. The 
last data point (24,5) is an outlier. Without that point we get a nice regression line. 
When that point is included, however, it pulls the regression line down consider- 
ably. The two regression lines are shown in Figure 6.5. 
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FIGURE 6.5 Regression Lines With and Without Influential Point 


How Summary Statistics Can Be Misleading 


One of the reviewers of this second edition noted that data sets provided by 
Anscombe (1973) can show why summary statistics can be misleading, and hence 
the need for plotting the data. I commented on these data sets in the first edition of 
my multivariate text (Stevens, 1986, p. 86), but did not show the plots. The actual 
data are as follows: 


X ҮІ Y2 Y3 
4 4.26 3.10 5:39 
5 5.68 4.74 5.73 
6 7.24 6.13 6.08 
7 4.82 7.26 6.42 
8 6.95 8.14 6.77 
9 8.81 8.77 7.11 
10 8.04 9.14 7.46 
11 8.33 9.26 7.81 
12 10.84 9.13 8.15 
13 7.58 8.74 12.74 


14 9.96 8.10 8.84 


14 


12 


10 


2 4 в a 10 12 14 16 


FIGURE 6.6 Plots for Anscombe (1973) Data Sets 
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These data sets have exactly the same correlation (.816) and the same regres- 
sion line: y = 3 + .5x. Yet the situations are quite different. The plots, from SPSS for 
Windows 12.0, are given in Figure 6.6. These plots show that only in the first case 
are the summary statistics an accurate indication of the situation. In the second 
case there is a curvilinear relationship, and in the last case there is an outlier. 


6.4 MULTIPLE REGRESSION 


In multiple regression we are interested in predicting a dependent variable from a 
set of predictors. Since human behavior is complex and influenced by many fac- 
tors, single predictor studies are limited in their predictive power. For example, ina 
college GPA study, we are able to predict college GPA better by considering pre- 
dictors other than high school GPA. Some other factors would be scores on stan- 
dardized tests (verbal and quantitative), and some noncognitive variables, such as 
study habits and attitude toward education. That is, we look to other predictors (of- 
ten test scores) that tap other aspects of criterion behavior. 
Consider three other examples of multiple regression studies: 


1. Feshbach, Adelman, and Williamson (1977) conducted a study of 850 mid- 
dle class children. The children were measured in kindergarten on a battery of vari- 
ables: WPPSI, deHirsch—Jansky Index (assessing various linguistic and percep- 
tual motor skills), the Bender Motor Gestalt, and a Student Rating Scale developed 
by the authors that measures various cognitive and affective behaviors and skills. 
These measures were used to predict reading achievement for these same children 
in grades 1, 2, and 3. 

2. Crystal (1988) attempted to predict chief executive officer (CEO) pay for the 
top 100 of last year’s Fortune 500 and the 100 top entries from last year’s Service 
500. He used the following predictors: company size, company performance, com- 
pany risk, government regulation, tenure, location, directors, ownership, and age. 
He found that only about 39% of the variance in CEO pay can be accounted for by 
these factors. 

3. Agresti (1990) gives an example based on real data for 93 homes that were 
sold in Florida. The dependent variable is price of the home and the predictors 
were size, number of bathrooms, number of bedrooms, and whether the home was 
new or not. 


In discussing simple regression we mentioned that least squares was used to es- 
timate the parameters and that this procedure minimized the sum of the squared er- 
rors of prediction. In multiple regression we will use least squares again. It is very 
important for the reader to realize that minimizing the sum of the squared errors of 
prediction is equivalent to maximizing the correlation between the observed and 
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predicted scores. This maximized correlation is called the multiple correlation, 
i.e., К = ry, yy. Nunnally (1978) characterized the procedure as “wringing out the 
last ounce of predictive power" (obtained from the linear combination of the xs, 
i.e., from the regression equation). Since the correlation is maximum for the sam- 
ple from which it is derived, when the regression equation is applied to an inde- 
pendent sample from the same population (1.е., cross-validated) the predictive 
power drops off. If the predictive power drops off sharply, then the equation is of 
very limited utility. That is, it has little generalizability, and hence is of limited sci- 
entific value. After all, we derive the prediction equation for the purpose of pre- 
dicting with it on future (other) samples. If the equation does not predict well on 
other samples, then it is not fulfilling the purpose for which it was designed. 

Sample size (n) and the number of predictors (k) are two crucial factors 
that determine how well a given equation will cross validate. In particular, the 
n/k ratio is crucial. For small ratios (5:1 orless) the shrinkage can be substantial. 

Since the rest of this chapter is rather lengthy, we give the reader an overview of 
the critical topics. As we will show shortly, how the predictors are correlated can 
have a big impact on the multiple correlation. This will take us into the topic of 
multicollinearity (Section 6.7). In Section 6.8 we discuss several methods for se- 
lecting a “good” set of predictors. In Section 6.9 we give two computer examples, 
using real data, to illustrate some of the methods discussed in Section 6.8. In Sec- 
tion 6.10 we discuss assumptions underlying the regression analysis, and how they 
can be checked. The crucial topic of model validation is discussed in Section 6.11. 
Since multiple regression is a mathematical maximization procedure, it is very im- 
portant to check the generalizability of the equation. 


6.5 BREAKDOWN OF SUM OF SQUARES 
IN REGRESSION AND F TEST 
FOR MULTIPLE CORRELATION 


In analysis of variance we broke down variability about the grand mean into be- 
tween and within variability. In regression analysis variability about the mean is 
broken down into variability due to regression and variability about the regression. 
To get at the breakdown, we start with the following identity: 


yi - 3i Gi - Y) - Gi У) 
Now we square both sides, obtaining 
Gi - i = [у - 32 - Gi - P 


Then we sum over the subjects, from 1 to n: 
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У (о – 592 => [бг -Y)- - YP 
i-l i-l 
By algebraic manipulation (see Draper & Smith, 1981, pp. 17-18), this can be 
rewritten as: 


$0»? = Poi? + У = i? 
sum of squares sum of squares sum of squares 
about mean = about regression + dueto regression 
(SSres) (55,2) 
df:n—l к= (п—К—1) + k(df = degrees of freedom) 


This results in the following analysis of variance table and the F test for deter- 
mining whether the population multiple correlation is different from 0. 


Source SS df MS F 
Regression SSreg k Sroglk MS reg 
Residual (error) SSres n-k-1 SSres/(n = k — 1) MSres 


Recall that since the residual for each subject is 6; = yi — ИЛ ‚ the mean square 
error term can be written as М8, = Xe? / (n = к — 1). Now, R? (squared multiple 
correlation) is given by: 

sum of squares 
. due to regression _ > О a) eee 
sum of squares Е У Oi — у)? E 

about the mean 


R2 


Thus, R? measures the proportion of total variance on у that is accounted for by 
the set of predictors. By simple algebra then we can rewrite the F test in terms of R? 
as follows: 


R2/k 


ш тез СЕ) with k and (n — & —1) df (1) 


We feel this test is of limited utility, since it does not necessarily imply that the 
equation will cross-validate well, and this is the crucial issue in regression analy- 
sis. 
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Example 3 


An investigator obtains А2 = .50 on a sample of 50 subjects with 10 predictors. Do 
we reject the null hypothesis that the population multiple correlation = 0? 


Е 50/10 
(1—.50) (50 —10 — 1) 


— 3.9 with 10 and 39 df 


This is significant at .01 level, since the critical value is 2.8. 

However, since the n/k ratio is only 5/1, the prediction equation will probably 
not predict well on other samples and is therefore of questionable utility. 

Myers' (1990) response to the question of what constitutes an acceptable value 
for R? is illuminating: 


This is a difficult question to answer, and, in truth, what is acceptable depends on the 
scientific field from which the data were taken. A chemist, charged with doing a lin- 
ear calibration on a high precision piece of equipment, certainly expects to experi- 
ence a very high R? value (perhaps exceeding .99), while a behavioral scientist, deal- 
ing in data reflecting human behavior, may feel fortunate to observe an R? as high as 
.70. An experienced model fitter senses when the value of R? is large enough, given 
the situation confronted. Clearly, some scientific phenomena lend themselves to 
modeling with considerably more accuracy than others. (p. 37) 


His point is that how well one can predict depends on context. In the physical 
sciences, generally quite accurate prediction is possible. In the social sciences, 
where we are attempting to predict human behavior (which can be influenced by 
many systematic and some idiosyncratic factors), prediction is much more diffi- 
cult. 


6.6 RELATIONSHIP OF SIMPLE CORRELATIONS 
TO MULTIPLE CORRELATION 


The ideal situation, in terms of obtaining a high R would be to have each of the pre- 
dictors significantly correlated with the dependent variable and for the predictors 
to be uncorrelated with each other, so that they measure different constructs and 
are able to predict different parts of the variance on y. Of course, in practice we will 
not find this because almost all variables are correlated to some degree. A good sit- 
uation in practice then would be one in which most of our predictors correlate sig- 
nificantly with y and the predictors have relatively low correlations among them- 
selves. To illustrate the above points further, consider the following three patterns 
of intercorrelations for three predictors. 
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Ху X» Хз Х X), Х Хі Хо ЖХ; 

(D Y 20 10 350 (2 Y 60 50 .70 (3 Y 60 70 .70. 
Xi 50 .40 Xi 20 .30 Х| .70 .60 

X» .60 X» 20 Хэ .80 


In which of these cases would you expect the multiple correlation to be the larg- 
est and the smallest respectively? Here it is quite clear that R will be the smallest 
for 1 because the highest correlation of any of the predictors with y is .30, whereas 
for the other two patterns at least one of the predictors has a correlation of .70 with 
y. Thus, we know that R will be at least .70 for cases 2 and 3, whereas for case 1 we 
only know that R will be at least .30. Furthermore, there is no chance that R for case 
1 might become larger than that for cases 2 and 3, because the intercorrelations 
among the predictors for 1 are approximately as large or larger than those for the 
other two cases. 

We would expect R to be largest for case 2 because each of the predictors is 
moderately to strongly tied to y and there are low intercorrelations (і.е., little re- 
dundancy) among the predictors, exactly the kind of situation we would hope to 
find in practice. We would expect R to be greater in case 2 than in case 3, because in 
case 3 there is considerable redundancy among the predictors. Although the corre- 
lations of the predictors with y are slightly higher in case 3 (.60, .70, .70) than in 
case 2 (.60, .50, .70), the much higher intercorrelations among the predictors for 
case 3 will severely limit the ability of X» and X; to predict additional variance be- 
yond that of X, (and hence significantly increase R), whereas this will not be true 
for case 2. 


6.7 MULTICOLLINEARITY 


When there are moderate to high intercorrelations among the predictors, as is the 
case when several cognitive measures are used as predictors, the problem is re- 
ferred to as multicollinearity. Multicollinearity poses a real problem for the re- 
searcher using multiple regression for three reasons: 


l. Itseverely limits the size of R, because the predictors are going after much 
of the same variance on y. A study by Dizney and Gromen (1967) illustrates very 
nicely how multicollinearity among the predictors limits the size of R. They stud- 
ied how well reading proficiency (x1) and writing proficiency (x2) would predict 
course grade in college German. The following correlation matrix resulted: 
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Xx] x2 y 
XI 1.00 58 33 
x2 1.00 45 
y 1.00 


Note the multicollinearity for хі and x2 (п, х, = .58), and also that x? has a sim- 
ple correlation of .45 with y. The multiple correlation R was only .46. Thus, the rel- 
atively high correlation between reading and writing severely limited the ability of 
reading to add hardly anything (only .01) to the prediction of German grade above 
and beyond that of writing. 

2. Multicollinearity makes determining the importance of a given predictor 
difficult because the effects of the predictors are confounded due to the correla- 
tions among them. 

3. Multicollinearity increases the variances of the regression coefficients. The 
greater these variances, the more unstable the prediction equation will be. 


The following are two methods for diagnosing multicollinearity: 


1. Examine the simple correlations among the predictors from the correlation 
matrix. These should be observed, and are easy to understand, but the re- 
searcher need be warned that they do not always indicate the extent of 
multicollinearity. More subtle forms of multicollinearity may exist. One 
such more subtle form is discussed next. 

2. Examine the variance inflation factors for the predictors. 


The quantity 1/(1—R7) is called the jth variance inflation factor, where R? is the 
squared multiple correlation for predicting the jth predictor from all other predic- 
tors. 

The variance inflation factor for a predictor indicates whether there is a strong 
linear association between it and all the remaining predictors. It is distinctly possi- 
ble for a predictor to have only moderate and/or relatively weak associations with 
the other predictors in terms of simple correlations, and yet to have a quite high R 
when regressed on all the other predictors. When is the value for a variance infla- 
tion factor large enough to cause concern? Myers (1990) offers the following sug- 
gestion: “Though no rule of thumb on numerical values is foolproof, it is generally 
believed that if any VIF exceeds 10, there is reason for at least some concern; then 
one should consider variable deletion or an alternative to least squares estimation 
to combat the problem" (p. 369). The variance inflation factors are easily obtained 
from SAS REG (cf. Table 6.6). 

There are at least three ways of combating multicollinearity. One way is to com- 
bine predictors that are highly correlated. For example, if there are three measures 
relating to a single construct which have intercorrelations of about .80 or larger, 
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then add them to form a single predictor. The two other ways (factor analysis and 
ridge regression) are more advanced; see Stevens (1996). 


6.8 MODEL SELECTION 


There are various methods available for selecting a good set of predictors: 


Substantive Knowledge 


As Weisberg (1985) noted, “The single most important tool in selecting a subset of 
variables for use in a model is the analyst’s knowledge of the substantive area un- 
der study" (p. 210). It is important for the investigator to be judicious in his/her se- 
lection of predictors. Far too many investigators have abused multiple regression 
by “throwing everything in the hopper,” often merely because the variables are 
available. Cohen (1990), among others, commented on the indiscriminate use of 
variables: “I have encountered too many studies with prodigious numbers of de- 
pendent variables, or with what seemed to me far too many independent vari- 
ables,or (heaven help us) both." 

There are several good reasons for generally preferring to work with a small 
number of predictors: (a) principle of scientific parsimony, (b) reducing the num- 
ber of predictors improves the n/k ratio, and this helps cross validation prospects, 
and (c) note the following from Lord and Novick (1968): 


Experience in psychology and in many other fields of application has shown that it is 
seldom worthwhile to include very many predictor variables in a regression equation, 
for the incremental validity of new variables, after a certain point, is usually very low. 
This is true because tests tend to overlap in content and consequently the addition of a 
fifth or sixth test may add little that is new to the battery and still relevant to the crite- 
rion. (p. 274) 


Or consider the following from Ramsey and Schafer (p. 325): 


There are two good reasons for paring down a large number of exploratory variables 
toa smaller set The first reason is somewhat philosophical: simplicity is preferable to 
complexity. Thus, redundant and unnecessary variables should be excluded on prin- 
ciple. The second reason is more concrete: unnecessary terms in the model yield less 
precise inferences. 
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Sequential Methods 


These are the forward, stepwise and backward selection procedures that are very 
popular with many researchers . All these procedures involve partialling out pro- 
cess; i.e., they look at the contribution of a predictor with the effects of the other 
predictors partialled out, or held constant. Many readers may have been exposed in 
a previous statistics course to the notion of a partial correlation, but a review is nev- 
ertheless in order. 

The partial correlation between variables 1 and 2 with variable 3 partialled from 
both | and 2 is the correlation with variable 3 held constant, as the reader may re- 
call. The formula for the partial correlation is given by: 


n23 = (n2 — пара)! 1— rà 1— r3 


Let us put this in the context of multiple regression. Suppose we wish to know 
what the partial of y (dependent variable) is with predictor 2 with predictor 1 
partialled out. The formula would be , following what we have above: 


тул = (2 — о A — rà ү]1— rj 


We apply this formula to show how SPSS obtains the partial correlation of .528 
for INTEREST in Table 6.4 under EXCLUDED VARIABLES in the first upcom- 
ing computer example. In this example CLARITY (abbreviated as clr)entered first, 
having a correlation of .862 with dependent variable INSTEVAL (abbreviated as 
inst). The correlations below are taken from the correlation matrix, given near the 
beginning of Table 6.4. 


Finstintelr = 435 —(.862)(.20)/ V1 — .8622 V1 — .202 


The correlation between the two predictors is .20, as shown above. 
We now give a brief description of the forward, stepwise and backward selec- 
tion procedures. 


FORWARD—The first predictor that has an opportunity to enter the equation is 
the one with the largest simple correlation with y. If this predictor is significant, 
then the predictor with the largest partial correlation with y is considered, etc. 
At some stage a given predictor will not make a significant contribution and the 
procedure terminates. It is important to remember that with this procedure, once 
a predictor gets into the equation it stays. 

STEPWISE—This is basically a variation on the forward selection procedure.. 
However, at each stage of the procedure a test is made of the least useful predic- 
tor. The importance of each predictor is constantly reassessed. Thus, a predictor 
that may have been the best entry candidate earlier may now be superfluous. 
BACKWARD—The steps are as follows: (a) An equation is computed with 
ALL the predictors. (b) The partial F is calculated for every predictor, treated as 
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though it were the last predictor to enter the equation. (с) The smallest partial F 
value, say Fi, is compared to a preselected significance, say Fo. ҒҒ < Fo , re- 
move that predictor and recomputed the equation with the remaining variables. 
Reenter stage B. 


Use of Mallow’s Cp 


Before we introduce Mallow’s C, , it is important to consider the consequences of 
underfitting (important variables are left out of the model) and overfitting (having 
variables in the model that make essentially no contribution or are marginal). 
Myers (1990, pp. 178-180) has an excellent discussion on the impact of 
underfitting and overfitting, and notes that, “A model that is too simple may suffer 
from biased coefficients and biased prediction, while an overly complicated model 
can result in large variances,both in the coefficients and in the prediction." 

This measure was introduced by Mallow's (1973) as a criterion for selecting a 
model. It measures total squared error, and it was recommended by Mallow's to 
choose the model(s) for which C, = p , where p = k + 1. 


All Possible Regressions 


If you wish to follow this route, then the SAS REG procedure should be consid- 
ered. The number of regressions increases quite sharply as k increases, however, 
the program will efficiency identify good subsets. Good subsets are those which 
have the smallest Mallow's value. 

Use of one or more of the above methods will often yield a number of models of 
roughly equal efficacy. As Myers noted (1990), “The successful model builder will 
eventually understand that with many data sets, several models can be fit that 
would be of nearly equal effectiveness. Thus, the problem that one deals with is the 
selection of one model from a pool of candidate models" (p.164). One of the 
problems with the stepwise methods, which are frequently used, is that they have 
led many researchers to conclude they have found the best model, when in fact 
there may be some better models and/or several other models that are about as 
good. As Huberty notes (1989), “And one or more of these subsets may be more in- 
teresting or relevant in a substantive sense" (p.46). 

As mentioned earlier, Mallows criterion is useful in guarding against both 
underfitting and overfitting. Another very important criterion that can be used to 
select from the candidate pool relates to the generalizability of the prediction equa- 
tion, i.e., validating the equation. Three methods of model validation are discussed 
in 6.11. Briefly, they are: 
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1. Data splitting—Randomly split the data, obtain a prediction equation on 
one part of the random split, and then check it’s predictive power on the 
other sample. 

2. Use of the PRESS statistic. 

3. Obtain an estimate of the average predictive power of the equation on many 
other samples from the same population, using a formula due to Stein 
(Herzberg, 1969). 


The SPSS application guides comment on overfitting and the use of several 
models. There is no one test to determine the dimensionality of the best submodel. 
Some researchers find it tempting to include too many variables in the model, 
which is called overfitting. Such a model will perform badly when applied to anew 
sample from the same population (cross validation). Automatic stepwise proce- 
dures can not do all the work for you. Use them as a tool to determine roughly the 
number of predictors needed (for example, you might find 3 to 5 variables). If you 
try several methods of selection, you may identify candidate predictors that are not 
included by any method. Ignore them and fit models, say, 3 to 5 variables, selecting 
alternative subsets from among the better candidates. You may find several subsets 
that perform equally as well. Then knowledge of the subject matter, how accu- 
rately individual variables are measured, and what a variable “communicates” may 
guide selection of the model to report. 

This writer doesn’t disagree with the above comments, however, he would fa- 
vor the model which cross validates best. If 2 models cross validate about the 
same, then I would favor the model which makes most substantive sense. 


6.9 TWO COMPUTER EXAMPLES 


To illustrate the use of several of the aforementioned model selection methods, we 
consider two computer examples. The first example illustrates the SPSS 
REGRESSION program, and uses data from Morrison (1983) on 32 students en- 
rolled in an MBA course. We predict instructor course evaluation from 5 predic- 
tors. The second example illustrates SAS REG on quality ratings of 46 research 
doctorate programs in psychology, where we are attempting to predict quality rat- 
ings from factors such as number of program graduates, percentage of graduates 
that received fellowships or grant support, etc. (Singer & Willett, 1988). 


Example 6—SPSS REGRESSION on Morrison MBA Data 


The data for this problem are from Morrison (1983). The dependent variable is in- 
structor course evaluation in an MBA course, with the five predictors being clarity, 
stimulation, knowledge, interest, and course evaluation. We illustrate two of the 


TABLE 6.3 
SPSS Control Syntax for Stepwise Regression Run on MORRISON Data 
and Correlation Matrix 


TITLE ‘MULTIPLE REGRESSION-MORRISON DATA’. 


DATA LIST FREE/INSTEVAL CLARITY STIMUL KNOWLEDG INTEREST 
COUEVAL. 
BEGIN DATA. 
1121 127212775121 111112 117271 12 
213222 224112 23 39112 234123 
22.3 4 3.5. 22 2222. 22 3.21012 2.2 Z2 3 3 2 
222042 2, 224222 2323113 234112 
232112 344322 343114 343123 
343223 334233 334233 343112 
345113 335123 344123 344113 
333213 335112 455234 445234 
END DATA 
LIST. 
(0 REGRESSION DESCRIPTIVES - DEFAULT/ 
VARIABLES - INSTEVAL TO COUEVAL/ 
® STATISTICS = DEFAULT TOL SELECTION/ 
DEPENDENT - INSTEVAL/ 
© METHODS = STEPWISE/ 
® CASEWISE = ALL PRED RESID ZRESID LEVER COOK/ 
© SCATTERPLOT (*RES,*PRE)/. 
CORRELATION MATRIX 
INSTEVAL CLARITY STIMUL KNOWLDGE INTEREST COUEVAL 
INSTEVAL 1.000 .862 739 .282 435 738 
CLARITY 862 1.000 617 057 200 651 
STIMUL 739 :617 1.000 .078 317 523 
KNOWLEDGE .282 057 078 1.000 583 041 
INTEREST 435 .200 317 583 1.000 448 


COUEVAL .738 .651 .523 041 448 1.000 


© The DESCRIPTIVES = DEFAULT subcommand yields the means, standard deviations and the 
correlation matrix for the variables. 

© This STATISTICS subcommand TOL part yields useful information concerning 
multicollinearity. In particular it yields the VIF's (variance inflation factors). The SELECTION part 
yields, among other things, Mallows’ prediction criterion, which is very useful in selecting a set of pre- 
dictors. 

® To obtain the backward selection procedure, we would simply put METHOD = BACK-WARD/ 

® This CASEWISE subcommand yields important regression diagnostics: ZRESID (standardized 
residuals—for identifying outliers on y), LEVER (hat elements—for identifying outliers on predic- 
tors), and COOK (Cook’s distance—for identifying influential data points). 

® This SCATTERPLOT subcommand yields the plot of the residuals vs. the predicted values, 
which is very useful for determining whether any of the assumptions underlying the linear regression 
model may be violated. 
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TABLE 6.4 
Selected Printout From SPSS Syntax Editor Stepwise Regression Run 
on the Morrison MBA Data 


Placeholder for Т0604а 


From p. 263 of previous edition 
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TABLE 6.4 
(Continued) 


Placeholder for T0604b from p. 264 of previous edition 
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TABLE 6.4 
(Continued) 


Place holder for T0604c from p. 265 of previous edition 


sequential procedures, stepwise and backward selection, using the SPSS 
REGRESSION program. The control lines for running the analyses, along with the 
correlation matrix, are given in Table 6.3. 

SPSS REGRESSION has “p values,” denoted by PIN and POUT, which govern 
whether a predictor will enter the equation and whether it will be deleted. The de- 
fault values are PIN = .05 and POUT = .10. In other words, a predictor must be 
"significant" at the .05 level to enter, or must not be significant at the .10 level to be 
deleted. 

First, we discuss the stepwise procedure results. Examination of the correlation 
matrix in Table 6.3 reveals that three of the predictors (CLARITY, STIMUL, and 
COUEVAL) are strongly related to INSTEVAL (simple correlations of .862, .739, 
and .738, respectively). Because clarity has the highest correlation, it will enter the 
equation first. Superficially, it might appear that STIMUL or COUEVAL would 
enter next; however, we must take into account how these predictors are correlated 
with CLARITY, and indeed both have fairly high correlations with CLARITY 
(.617 and .651 respectively). Thus, they will not account for as much unique vari- 
ance on INSTEVAL, above and beyond that of CLARITY, as first appeared. On the 
other hand, INTEREST, which has a considerably lower correlation with 
INSTEVAL (.44), is only correlated .20 with CLARITY. Thus, the variance on 
INSTEVAL it accounts for is relatively independent of the variance CLARITY ac- 
counted for. And, as seen in Table 6.4, it is INTEREST that enters the regression 
equation second. 


TABLE 6.5 
Selected Printout From SPSS Regression for Backward Selection 
on the Morrison MBA Data 


Placeholder for T0605a from p. 266 of previous edition. 
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TABLE 6.5 
(Continued) 


Placeholder for T0605b from p. 267 of previous edition. 


STIMUL is the third and final predictor to enter, since its p value (.0086) is less 
than the default value of .05. Finally, the other predictors (KNOWLEDGE and 
COUEVAL) don’t enter since their p values (.0989 and .1288) are greater than .05. 

Selected printout from the backward selection procedure appears in Table 6.5. 
First, all of the predictors are put into the equation. Then, the procedure determines 
which of the predictors makes the /east contribution when entered last in the equa- 
tion. That predictor is INTEREST, and since its p value is .9097, it is deleted from 
the equation. None of the other predictors can be further deleted because their p 
values are much less than .10. 

Interestingly, note that two different sets of predictors emerge from the two se- 
quential selection procedures. The stepwise procedure yields the set (CLARITY, 
INTEREST, and STIMUL), while the backward procedure yields the set 
(COUEVAL, KNOWLEDGE, STIMUL, and CLARITY). However, CLARITY 
and STIMUL are common to both sets. On the grounds of parsimony, we might 
prefer the set (CLARITY, INTEREST, and STIMUL), especially since the ad- 
justed R?s for the two sets are quite close (.84 and .87). 

There are three other things that should be checked out before settling on 
this as our chosen model: 


1. Weneed to determine if the assumptions of the linear regression model 
are tenable. 

2. We need an estimate of the cross-validity power of the equation. 

3. We need to check for the existence of outliers and/or influential data 
points. 


Figure 6.4 showed the plot of the residuals versus the predicted values from 
SPSS. This plot showed essentially random variation of the points about the hori- 
zontal line of 0, indicating no violations of assumptions. 

The issues of cross-validity power and outliers are considered later in this chap- 
ter, and are applied to this problem in Section 6.15, after both topics have been cov- 
ered. 
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ТАВГЕ 6.6 
SAS REG Control Lines for Stepwise and MAXR Runs on the National 
Academy of Sciences Data and the Correlation Matrix 


DATA SINGER; 

INPUT QUALITY NFACUL NGRADS PCTSUPP PCTGRT NARTIC PCTPUB; 
CARDS; 

DATA LINES 

PROC REG SIMPLE CORR; 

MODEL QUALITY = NFACUL NGRADS PCTSUPP PCTGRT NARTIC PCTPUB/ 
SELECTION = STEPWISE VIF R INFLUENCE; 

MODEL QUALITY = NFACUL NGRADS PCTSUPP PCTGRT NARTIC PCTPUB/ 
SELECTION = MAXR VIF R INFLUENCE; 


SIMPLE is needed to obtain descriptive statistics (means, variances, etc) for all variables. 
CORR is needed to obtain the correlation matrix for the variables. 


In this MODEL statement, the dependent variable goes on the left and all predictors to the right 
of the equals. 

SELECTION is where we indicate which of the 9 procedures we wish to use. There is a wide 
variety of other information we can get printed out. Here we have selected VIF (variance 
inflation factors), R (analysis of residuals—standard residuals, hat elements, Cooks D), and 
INFLUENCE (influence diagnostics). 


Note that there are two separate MODEL statements for the two regression procedures being 
requested. Although multiple procedures can be obtained in one run, you must have separate 
MODEL statement for each procedure. 


CORRELATION MATRIX 
NFACUL NGRADS PCTSUPP PCTGRT NARTIC PCTPUB QUALITY 
2 3 4 5 6 T 1 
NFACUL 2 1.000 
NGRADS 3 0.692 1.000 
PCTSUPP 4 0.395 0.337 1.000 
PCTGRT 5 0.162 0.071 0.351 1.000 
NARTIC 6 0.755 0.646 0.366 0.436 1.000 
PCTPUB 7 0.205 0.171 0.347 0.490 0.593 1.000 
QUALITY 1 0.622 0.418 0.582 0.700 0.762 0.585 1.000 


Example 7—SAS REG on Doctoral Programs 
in Psychology 


The data for this example come from a National Academy of Sciences report 
(Jones, Lindzey, & Coggsdall, 1982) that, among other things, provided ratings on 
the quality of 46 research doctoral programs in psychology. The six variables used 
to predict quality are: 
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NFACULTY—number of faculty members in the program as of December 
1980. 

NGRADS—number of program graduates from 1975 through 1980. 
PCTSUPP—percentage of program graduates from 1975-1979 that received 
fellowships or training grant support during their graduate education. 
PCTGRANT—percentage of faculty members holding research grants from 
the Alcohol, Drug Abuse, and Mental Health Administration, the National In- 
stitute of Health or the National Science Foundation at any time during 
1978-1980. 

NARTICLE—nunbber of published articles attributed to program faculty mem- 
bers from 1978-1980. 

PCTPUB—percentage of faculty with one or more published articles from 
1978-1980. 


Both the stepwise procedure and the MAXR procedure were used оп this data 
to generate several regression models. The control lines for doing this, along with 
the correlation matrix, are given in Table 6.6. 

The stepwise procedure terminated after 4 predictors entered. Below is the sum- 
mary table, exactly as it appears on the printout: 


Summary of Stepwise Procedure for Dependent Variable QUALITY 


Variable Partial Model 
Step Entered Removed REO R**2 C(p) F Prob > Е 
1 NARTIC 0.5809 0.5809 55.1185 60.9861 0.0001 
2 PCTGRT 0.1668 0.7477 18.4760 28.4156 0.0001 
3 PCTSUPP 0.0569 0.8045 7.2970 12.2197 0.0011 
4 NFACUL 0.0176 0.8221 5.2161 4.0595 0.0505 


This four predictor model appears to be a reasonably good one. First, Mallows’ 
Ср is very close to p (recall p = К+ 1), that is, 5.216 = 5, indicating that there is not 
much bias in the model. Second, R? = .8221, indicating that we can predict quality 
quite well from the 4 predictors. Although this R? is not adjusted, the adjusted 
value will not differ much because we have not selected from a large pool of pre- 
dictors. 

Selected printout from the MAXR procedure run appears in Table 6.7. From Ta- 
ble 6.7 we can construct the following results: 


BEST MODEL VARIABLE(S) MALLOWS' Cp 
for 1 variable NARTIC 55.118 
for 2 variables PCTGRT, NFACUL 16.859 
for 3 variables PCTPUB, PCTGRT, NFACUL 9.147 
for 4 variables NFACUL, PCTSUPP, PCTGRT, NARTIC 5.216 


In this case, the same 4 predictor model is selected by the MAXR procedure that 
was selected by the stepwise procedure. 
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Caveat on p Values for the “Significance” of Predictors 


The p values that are given by SPSS and SAS for the “significance” of each predic- 
tor at each step for stepwise or the forward selection procedures should be treated 
tenuously, especially if your initial pool of predictors is moderate (15) or large 
(30). The reason is that the ordinary F distribution is not appropriate here, because 
the largest F is being selected out of all Fs available. Thus, the appropriate critical 
value will be larger (and can be considerably larger) than would be obtained from 
the ordinary null F distribution. Draper and Smith (1981) note, “Studies have 
shown, for example, that in some cases where an entry F test was made at the а, 
level, the appropriate probability was ga, where there were q entry candidates at 
that stage" (p. 311). This is saying, for example, that an experimenter may think his 
or her probability of erroneously including a predictor is .05, when in fact the ac- 
tual probability of erroneously including the predictor is .50 (if there were 10 entry 
candidates at that point)! 

Thus, the F tests are positively biased, and the greater the number of predictors, 
the larger the bias. Hence, these F tests should be used only as rough guides to the 
usefulness of the predictors chosen. The acid test is how well the predictors do un- 
der cross-validation. It can be unwise to use any of the stepwise procedures with 20 
or 30 predictors and only 100 subjects, since capitalization on chance is great, and 
the results may well not cross-validate. To find an equation that probably will have 
generalizability, it is best to carefully select (using substantive knowledge and/or 
any previous related literature) a small or relatively small set of predictors. 


6.10 CHECKING ASSUMPTIONS 
FOR THE REGRESSION MODEL 


Recall that in the linear regression model it is assumed that the errors are independ- 
ent and follow a normal distribution with constant variance. The normality as- 
sumption can be checked through use of the histogram of the standardized residu- 
als. The independence assumption implies that the subjects are responding 
independently of one another. This is an important assumption. Even a slight viola- 
tion can cause the type I error rate to be several times greater than what one desires. 

Let us consider a situation where the independence assumption would not be 
tenable. Suppose we had 50 college freshmen each write 4 in class essays. Then, 
although we have 200 essays to grade, we have only 50 independent responses, 
since the responses for each student are going to be correlated. 
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Residual Plots 


There are various plots available for assessing potential problems with the regres- 
sion model (Draper & Smith, 1981; Weisberg, 1985). A very useful plot graphs the 
standardized residuals (у) vs. the predicted values (x). If the assumptions of the lin- 
ear regression model are tenable, then the standardized residuals should scatter 
randomly about a horizontal line of 0, as shown in Figure 6.3a (see Section 6.3). 
Any systematic pattern or clustering of the residuals suggests a model violation(s). 
Three such systematic patterns are shown in Figures 6.3b to 6.3d. Figure 6.3b 
shows a systematic quadratic (second degree equation) clustering of the residuals. 
For Figure 6.3c the variability of the residuals increases systematically as the pre- 
dicted values increase, suggesting a violation of the constant variance assumption. 

In Figure 6.7 we present residual plots for three real data sets. The first plot is 
for the Morrison data (the first computer example), and shows essentially random 
scatter of the residuals, suggesting no violations of assumptions. The remaining 
two plots are from a study by a statistician who analyzed the salaries of over 260 
major league hitters, using predictors such as career batting average, career home 
runs per time at bat, years in the major leagues, etc. These plots are from Moore 
and McCabe (1989), and are used with permission. Figure 6.7b, which plots the re- 
siduals versus predicted salaries, shows a clear violation of the constant variance 
assumption. For lower predicted salaries there is little variability about 0, but for 
the high salaries there is considerable variability of the residuals. The implication 
of this is that the model will predict lower salaries quite accurately, but not so for 
the higher salaries. 

Figure 6.7c plots the residuals versus number of years in the major leagues. This 
plot shows a clear curvilinear clustering, that is, quadratic. The curved lines en- 
compass the vast majority of points to make this trend even more evident. The im- 
plication of this curvilinear trend is that the regression model will tend to overesti- 
mate the salaries of players who have been in the major leagues only a few years or 
over 15 years, while it will underestimate the salaries of players who have been in 
the majors about 5 to 9 years. 

In concluding this section, note that if nonlinearity or nonconstant variance is 
found, there are various remedies. For nonlinearity, perhaps a polynomial model is 
needed. Or sometimes a transformation of the data will enable a nonlinear model 
to be approximated by a linear one. For nonconstant variance, weighted least 
squares is one possibility, or more commonly, a variance stabilizing transforma- 
tion (such as square root or log) may be used. I refer the reader to Weisberg (1985, 
Chapter 6) for an excellent discussion of remedies for regression model violations. 
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(b) Model Violation: Heterogeneous Variance 


FIGURE 6.7 Residual Plots for Three Real Data Sets Showing No Violations, 
Heterogenous Variance, and Curvilinearity 
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(c) Model Violation: Curvilinearitv 


FIGURE 6.7 (Coninued) 


6.11 MODEL VALIDATION 


We indicated earlier that it was crucial for the researcher to obtain some measure of 
how well the regression equation will predict on an independent sample(s) of data. 
That is, it was important to determine whether the equation had generalizability. We 
discuss here two methods of model validation: one empirical, and the other involving 
an estimate of average predictive power on other samples. A third method of model 
validation, particularly useful when one has a small or moderate sample, utilizes 
what is called the PRESS statistic. This is a nice empirical measure, but it is more 
complicated, so I have put it in an Appendix to this chapter for those who are inter- 
ested. Let me give a brief description of the two methods, and then I will elaborate on 
each form of validation. 


Data Splitting. Неге the sample is randomly split in half. It does not have to 
be split evenly, but we use this for illustration. The regression equation is found on 
the so-called derivation sample (also called the screening sample, or the sample 
that “gave birth” to the prediction equation by Tukey). This prediction equation is 
then applied to the other sample of data (called the validation sample) to see how 
well it predicts the y score there. 
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Compute an Adjusted R2. There are various adjusted R? measures, or 
measures of shrinkage in predictive power, but they do not estimate the same thing. 
The one most commonly used, and that which is printed out by SPSS and SAS, is 
due to Wherry. It is very important to note that the Wherry formula estimates how 
much variance on y would be accounted for if we had derived the equation in the 
population from which the sample was drawn. The Wherry formula does not indi- 
cate how well the derived equation will predict on other samples from the same 
population. A formula due to Stein (1960) does estimate average cross-validation 
predictive power. Unfortunately, it was not printed out by SPSS and SAS about 10 
years ago, and it is still not printed out by either package. 


Data Splitting 


Recall that the sample is randomly split. The regression equation is found on the 
derivation sample and then is applied to the other sample (validation) to determine 
how well it will predict y there. Below we give a hypothetical example, randomly 
splitting 100 subjects. 


Derivation Sample Validation Sample 
n= 50 n= 50 
Prediction Equation y; = 4 + .Зху + .7x2 y хі X2 
6 1 5 
4.5 2 3 
7 5 2 


Now, using the above prediction equation we predict the y scores in the valida- 
tion sample: 


ў 244 30) + 1(5) = 4.65 
$2 244.30) + 7(3) =4.81 


узо = 4 + .3(5) + .7(.2) = 5.64 


The cross-validated R then is the correlation for the following set of scores: 


М yi 
6 4.65 
4.5 4.81 
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Adjusted R2 


Herzberg (1969) presents a discussion of various formulas that have been used to 
estimate the amount of shrinkage found in R2. As mentioned earlier, the one most 
commonly used, and due to Wherry, is given by 


СР ee (2) 

(n—k-—1) 
where f is the estimate of p, the population multiple correlation coefficient. This 
is the adjusted R? printed out by SAS and SPSS. Draper and Smith (1981) com- 
ment on Equation 5: “A related statistic...is the so called adjusted ғ (R42), the 
idea being that the statistic R2 can be used to compare equations fitted not only 
to a specific set of data but also to two or more entirely different sets of data. 
The value of this statistic for the latter purpose is, in our opinion, not high" (p. 
92). 

Herzberg notes that, "In applications, the population regression function can 
never be known and one is more interested in how effective the sample regression 
function is in other samples. A measure of this effectiveness is re, the sample 
cross-validity. For any given regression function г. will vary from validation sam- 
ple to validation sample. The average value of г. will be approximately equal to the 
correlation, in the population, of the sample regression function with the criterion. 
This correlation is the population cross-validity, рг. Wherry's formula estimates p 
rather than p." (p. 4). 

There are two possible models for the predictors: (1) regression—the values of 
the predictors are fixed, i.e., we study y only for certain values of x, and (2) correla- 
tion—the predictors are random variables—this is a much more reasonable model 
for social science research. Herzberg presents the following formula for estimating 


p2 under the correlation model: 
x -1 п-2 |п-+1 
5 |" 1—R? 3 
| аа взу 6) 


n—k-1 


where n is sample size and k is the number of predictors. It can be shown that p. < 
р. 

If you are interested in cross-validity predictive power, then the Stein formula 
(Equation 6) should be used. As an example, suppose n = 50, k = 10, and А2 .50. If 
you use the Wherry formula (Equation 5), then your estimate is 


62 =1—49/39(.50) = .372 


whereas with the proper Stein formula you would obtain 


62 = 1—(49/39)(48 /38)(51/50)(.50) = .191 
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In other words, use of the Wherry formula would give a misleadingly positive im- 
pression of the cross validity predictive power of the equation. 

Table 6.8 shows how the estimated predictive power drops off using the Stein 
formula (Equation 6) for small to fairly large subject/variable ratios when R2 = .50. 


6.12 IMPORTANCE OF THE ORDER 
OF THE PREDICTORS IN REGRESSION ANALYSIS 


The order in which the predictors enter a regression equation can make a great deal 
of difference with respect to how much variance on y they account for, especially 
for moderate or highly correlated predictors. Only for uncorrelated predictors 
(which would rarely occur in practice) does the order not make a difference. We 
give two examples to illustrate. 


Example 8 


A dissertation by Crowder (1975) attempted to predict ratings of trainably men- 
tally retarded individuals (TMs) using I.Q. (x2) and scores from a TEST of Social 
Inference (TSI). He was especially interested in showing that the TSI had incre- 
mental predictive validity. The criterion was the average ratings by two individuals 
in charge of the TMs. The intercorrelations among the variables were: 


Dax, = 59, ny, =-54, ry, = 566 


Now, consider two orderings for the predictors, one where TSI is entered first, 
and the other ordering where 1.0. is entered first. 


First Ordering Second Ordering 
% of variance % of variance 
TSI 32.04 10. 29.16 
LQ. 6.52 TSI 9.40 


The first ordering conveys an overly optimistic view of the utility of the TSI 
scale. Since we know that І.О. will predict ratings it should be entered first in the 
equation (as a control variable), and then TSI to see what its incremental validity is, 
i.e., how much it adds to predicting ratings above and beyond what І.О. does. Be- 
cause of the moderate correlation between I.Q. and TSI, the amount of variance ac- 
counted for by TSI differs considerably when entered first vs. second (32.04 vs. 
9.4). 
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TABLE 6.8 
Estimated Cross Validity Predictive Power for Stein Formula 
Stein Estimate Formula 


Subject/Variable Ratio Stein Estimate 


Small (5:1) 


N=50, k= 10, R2 = .50 1910 

М = 50, k= 10, R2 = 75 595 

М= 50, К = 10, R? = .85 757 
Moderate (10:1) 

N = 100, k = 10, R2 2.50 374 

М = 100, k = 10, R2 = 75 .690 
Fairly Large (15:1) 

N= 150, k = 10, R? = .50 421 


(DIf there is selection of predictors from a larger set, then the median should be used as the k. For ex- 
ample, if 4 predictors were selected from a set of 30 predictors by, say, the stepwise procedure, then the 
median between 4 and 30 (that is, 17) should be the k used in the Stein formula. 


The 9.4% of variance accounted for by TSI when entered second is obtained 
through the use of the semipartial correlation previously introduced: 


566 —.54(.59) 
ҮЛ — 592 


Kis) = 306 r2 Xs) 0.94 


Example 9 


Consider the following matrix of correlations for a three predictor problem: 


хі X2 X3 
y .60 70 70 

хі 70 .60 

x2 180 


How much variance on у will x3 account for if entered first, and if entered last? 
If x3 is entered first, then it will account for (.7)? x 100 or 49% of the variance on y. 
If хз is entered last, we need to compute a second order semipartial correlation (see 
Stevens, 1996, p. 102 for details). The answer is only 4.8% of the variance on y. Be- 
cause the predictors are so highly correlated, most of the variance on y that x3 could 
have accounted for has already been accounted for by хі and x». 
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Controlling the Order of Predictors in the Equation 


With the forward and stepwise selection procedures, the order of entry of predic- 
tors into the regression equation is determined via a mathematical maximization 
procedure. That is, the first predictor to enter is the one with the largest (maxi- 
mized) correlation with y, the second to enter is the predictor with the largest 
partial correlation, etc. However, there are situations where one may not want the 
mathematics to determine the order of entry of predictors. For example, suppose 
we have a five predictor problem, with two proven predictors from previous re- 
search. The other three predictors are included to see if they have any incremental 
validity. In this case we would want to enter the two proven predictors in the equa- 
tion first (as control variables), and then let the remaining three predictors “fight it 
ош” to determine whether any of them add anything significant to predicting y 
above and beyond the proven predictors. 

With SPSS REGRESSION or SAS REG we can control the order of predictors, 
and in particular, we can force predictors into the equation. In Table 6.9 we illus- 
trate how this is done for SPSS and SAS for the above five predictor situation. 


6.13 OTHER IMPORTANT ISSUES 


Preselection of Predictors 


An industrial psychologist hears about the predictive power of multiple regression 
and is excited. He wants to predict success on the job, and gathers data for 20 po- 
tential predictors on 70 subjects. He obtains the correlation matrix for the vari- 
ables, and then picks out 6 predictors that correlate significantly with success on 
the job and that have low intercorrelations among themselves. The analysis is run, 
and the А2 is highly significant. Furthermore, he is able to explain 52% of the vari- 
ance on y (more than other investigators have been able to do). Are these results 
generalizable? Probably not, since what he did involves a double capitalization on 
chance: 


1. First, in preselecting the predictors from a larger set, he is capitalizing on 
chance. Some of these variables would have high correlations with y be- 
cause of sampling error, and consequently their correlations would tend to 
be lower in another sample. 

2. Second, the mathematical maximization involved in obtaining the multiple 
correlation involves capitalizing on chance. 


Preselection of predictors is common among many researchers, who are un- 
aware of the fact that this tends to make their results sample specific. Nunnally 
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(1978) has а пісе discussion of the preselection problem, and Wilkinson (1979) 
has shown the considerable positive bias preselection can have on the test of signif- 
icance of R? in forward selection. The following example from his tables illus- 
trates. The critical value for a 4 predictor problem (n = 35) at .05 level is .26, while 
the appropriate critical value for the same n and о; level, when preselecting 4 pre- 
dictors from a set of 20 predictors is .51! Unawareness of the positive bias has led 
to many results in the literature that are not replicable, for as Wilkinson notes, “A 
computer assisted search for articles in psychology using stepwise regression from 
1969 to 1977 located 71 articles. Out of these articles, 66 forward selections analy- 
ses reported as significant by the usual F tests were found. Of these 66 analyses, 19 
were not significant by [his] Table 1.” 

It is important to note that both the Wherry and Herzberg formulas do not take 
into account preselection. Hence, the following from Cohen and Cohen (1983) 
should be seriously considered: “A more realistic estimate of the shrinkage is ob- 
tained by substituting for k the total number of predictors from which the selection 
was made" (p. 107). In other words, they are saying if 4 predictors were selected 
out of 15, usek z 15 in the Herzberg formula. While this may be conservative, us- 
ing 4 will certainly lead to a positive bias. Probably a median value between 4 and 
15 would be closer to the mark, although this needs further investigation. 


Positive Bias of R? 


A study by Schutz (1977) on California principals and superintendents illustrates 
how capitalization on chance in multiple regression (if the researcher is unaware of 
it) can lead to misleading conclusions. Schutz was interested in validating a “con- 
tingency theory of leadership,” that is, that success in administering schools calls 
for different personality styles depending on the social setting of the school. The 
theory seems plausible, and in what follows we are not criticizing the theory per se, 
but the empirical validation of it. Schutz's procedure for validating the theory in- 
volved establishing a relationship between various personality attributes (24 pre- 
dictors) and several measures of administrative success in heterogeneous samples 
with respect to social setting using multiple regression, that is, find the multiple R 
for each measure of success on 24 predictors. Then he showed that the magnitude 
of the relationships was greater for subsamples homogeneous with respect to so- 
cial setting. The problem was that he had nowhere near adequate sample size for a 
reliable prediction equation. Below we present the total sample sizes and the 
subsamples homogeneous with respect to social setting: 


Superintendents Principals 


Total п-77 п-147 
Subsample(s) n=29 пу = 35, m = 61, m = 36 
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TABLE 6.9 
Controlling the Order of Predictors and Forcing Predictors Into the 
Equation With SPSS REGRESSION and SAS REG 


SPSS REGRESSION 
TITLE ‘FORCING X3 AND X4 & USING STEPWISE SELECTION FOR OTHERS’. 
DATA LIST FREE/Y Х1 Х2 X3 X4 X5 
BEGIN DATA. 
DATA LINES 


END DATA. 
REGRESSION VARIABLES = Y Х1 Х2 X3 X4 Х5/ 


DEPENDENT = Ү 

@ ENTER X3/ENTER X4/STEPWISE/. 
SAS REG 

DATA FORCEPR; 

INPUT Y X1 X2 X3 X4 X5; 

CARDS; 

DATA LINES 

PROC REG SIMPLE CORR; 

Q MODEL Y - X3 X4 X1 X2 X5/INCLUDE - 2 SELECTION - STEPWISE; 


© These two ENTER subcommands will force the predictors in the specific order indicated. Then 
the STEPWISE subcommand will determine whether any of the remaining predictors (X1, X2, or X5) 
have semipartial correlations large enough to be “significant.” If we wished to force in predictors ХІ, 
X3, and X4 and then use STEPWISE, the subcommand is ENTER ХІ ХЗ X4/STEPWISE/ 

© The INCLUDE = 2 forces the first 2 predictors listed in the MODEL statement into the prediction 
equation. Thus, if we wish to force X3 and X4 we must list them first on the MODEL statement. 


Indeed, Schutz did find that the R’s in the homogeneous subsamples were оп the 
average .34 greater than in the total samples; however, this was an artifact of the 
multiple regression procedure in this case. As Schutz went from total to his 
subsamples the number of predictors (k) approached sample size (п). For this situa- 
tion the multiple correlation increases to 1 regardless of whether there is any rela- 
tionship between y and the set of predictors. And in 3 of 4 of Schutz's subsamples 
the n/k ratios became dangerously close to 1. In particular it is the case that E(R2) — 
k/(n—1), when the population multiple correlation = 0 (Morrison, 1976). 

To dramatize this, consider subsample 1 for the principals. Then E(R?) = 24/34 
= .706, even when there is no relationship between y and the set of predictors. The 
critical value required just for statistical significance of R at .05 is 2.74, which im- 
plies К2 > .868, just to be confident that the population multiple correlation is dif- 
ferent from 0! 
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6.14 OUTLIERS AND INFLUENTIAL DATA POINTS 


Since multiple regression is a mathematical maximization procedure, it can be 
very sensitive to data points that "split off’ or are different from the rest of the 
points, that is, to outliers. Just 1 or 2 such points can affect the interpretation of re- 
sults, and it is certainly moot as to whether 1 or 2 points should be permitted to 
have such a profound influence. Therefore, it is important to be able to detect outli- 
ers and influential points. There is a distinction between the two because a point 
that is an outlier (either on y or for the predictors) will not necessarily be influential 
in affecting the regression equation. 

There are two basic approaches that can be used in dealing with outliers and in- 
fluential points. We consider the approach of having an arsenal of tools for isolat- 
ing these important points for further study, with the possibility of deleting some or 
all of the points from the analysis. The other approach is to develop procedures that 
are relatively insensitive to wild points (1.е., robust regression techniques). 


Data Editing 


Outliers and influential cases can occur because of recording errors. Consequently, 
researchers should give more consideration to the data editing phase of the data 
analysis process (i.e., always listing the data and examining the list for possible er- 
rors). There are many possible sources of error, from the initial data collection to 
the final keypunching. First, some of the data may have been recorded incorrectly. 
Second, even if recorded correctly, when all of the data are transferred to a single 
sheet or a few sheets in preparation for keypunching, errors may be made. Finally, 
even if no errors are made in these first two steps, an error(s) could be made in en- 
tering the data into the terminal. 

There are various statistics for identifying outliers on y and on the set of predic- 
tors, as well as for identifying influential data points. We discuss first, in brief 
form, a statistic for each, with advice on how to interpret that statistic. Equations 
for the statistics are given in my multivariate text (Stevens, 1996), along with a 
more extensive and somewhat technical discussion for those who are interested. 


Measuring Outliers on y 


For finding subjects whose predicted scores are quite different from their actual y 
scores (i.e., they do not fit the model well), the standardized residuals (ri) can be 
used. If the model is correct, then they have a normal distribution with a mean of 0 
and a standard deviation of 1. Thus, about 9596 of the r; should lie within two stan- 
dard deviations of the mean and about 99% within three standard deviations. 
Therefore, any standardized residual greater than about 3 in absolute value is un- 
usual and should be carefully examined. 


SIMPLE AND MULTIPLE REGRESSION 261 


Measuring Outliers on Set of Predictors 


The hat elements (ћу) can be used here. It can be shown that the hat elements Пе be- 
tween 0 and 1, and that the average hat element is p/n, where p = К + 1. Because of 
this, Hoaglin and Welsch (1978) suggest that 2p/n may be considered large. How- 
ever, this can lead to more points then we really would want to examine, and the 
reader should consider using 3p/n. For example, with 6 predictors and 100 sub- 
jects, any hat element (also called leverage) greater than 3(7)/100 = .21 should be 
carefully examined. This is a very simple and useful rule of thumb for quickly 
identifying subjects who are very different from the rest of the sample on the set of 
predictors. 


Measuring Influential Data Points 


An influential data point is one that when deleted produces a substantial change in 
at least one of the regression coefficients. That is, the prediction equations with and 
without the influential point are quite different. Cook’s distance (1977) is very use- 
ful for identifying influential points. It measures the combined influence of the 
case being an outlier on y and on the set of predictors. Cook and Weisberg (1982) 
have indicated that a Cook’s distance > 1 would generally be considered large. 
This provides a “red Пар,” when examining computer printout, for identifying in- 
fluential points. 

All of the above diagnostic measures are easily obtained from SPSS 
REGRESSION (cf. Table 6.3) or SAS REG (cf. Table 6.6). 


6.15 FURTHER DISCUSSION OF THE TWO 
COMPUTER EXAMPLES 


Morrison Data 


Recall that for Morrison data the stepwise procedure yielded the more parsimoni- 
ous model involving 3 predictors: CLARITY, INTEREST, and STIMUL. If we 
were interested in an estimate of the predictive power in the population, then the 
Wherry estimate given by Equation 5 is appropriate. This is given in Table 6.4 un- 
der model 3 as ADJUSTED R SQUARE .840. Here the estimate is used in a de- 
scriptive sense; to describe the relationship in the population. However, if we are 
interested in the cross-validity predictive power, then the Stein estimate (Equation 
6) should be used. The Stein adjusted №2 in this case is 


02 = 1—(31/38)(30/27)(33/32)(1 —.856) = .82 
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This estimates that if we were to cross-validate the prediction equation on many 
other samples from the same population, then on the average we would account for 
about 82% of the variance on the dependent variable. In this instance the estimated 
dropoff in predictive power is very little from the maximized value of 85.56%. The 
reason is that the association between the dependent variable and the set of predic- 
tors is very strong. Thus, we can have confidence in the future predictive power of 
the equation. 

It is also important to examine the regression diagnostics to check for any outli- 
ers and/or influential data points. Table 6.10 presents the appropriate statistics, as 
discussed in Section 6.14, for identifying outliers on the dependent variable (stan- 
dardized residuals), outliers on the set of predictors (hat elements), and influential 
data points (Cook’s distance). 

First, we would expect only about 5% of the standardized residuals to be > 211 
the linear model is appropriate. From Table 6.10 we see that 2 of the ZRESID are > 
21, and we would expect about 32(.05) = 1.6, so nothing seems to be awry here. 
Next, we check for outliers on the set of predictors. The rough “critical value” here 
is 3p/n = 3(4)/32 = .375. Since there are no values under LEVER in Table 6.10 ex- 
ceeding this value, we have no outliers on the set of predictors. Finally, and per- 
haps most importantly, we check for the existence of influential data points using 
Cook’s D. Recall that Cook (1982) has suggested if D > 1, then the point is influen- 
tial. All the Cook Ds in Table 6.10 are far less than 1, so we have no influential data 
points. 

In summary then, the linear regression model is quite appropriate for the Morri- 
son data. The estimated cross validity power is excellent, and there are no outliers 
or influential data points. 


National Academy of Sciences Data 


Recall that both the stepwise procedure and the MAXR procedure yielded the 
same "best" 4-predictor set: NFACUL, PCTSUPP, PCTGRT, апа NARTIC. The 
maximized К2 = .8221, indicating that 82.21% of the variance in quality can be ac- 
counted for by these 4 predictors in this sample. Now we obtain two measures of 
the cross-validity power of the equation. First, from the SAS REG printout, we 
have PREDICTED RESID SS (PRESS) = 1350.33. Furthermore, the variance for 
QUALITY is 101.438, so that X(Y;-Y)? = 4564.71. From these numbers we сап 
compute 


К2 ъс; = 1 — (1350.33) / 4564.71 = .7042 


This is a good measure of the external predictive power of the equation, where 
we have n validations, each based on (n—/) observations. 
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The Stein estimate of how much variance on the average we would account for 
if the equation were applied to many other samples is 


p2 =1—(45/41)(44/ 4047 / 46)(1 = .822) = .7804 


Now we turn to the regression diagnostics from SAS REG, which are presented 
in Table 6.11. In terms of the standardized residuals for y, there are two that stand 
out (-3.0154 апа 2.5276 for observations 25 and 44). These are for the University 
of Michigan and Virginia Polytech. In terms of outliers on the set of predictors, us- 
ing 2p/n = 2(5)/46 = .217, there are outliers for observation 15 (University of Geor- 
gia), observation 25 (University of Michigan again), and observation 30 (North- 
eastern). 

Using the criterion of Cook D > 1, there is one influential data point, observa- 
tion 25 (University of Michigan). Recall that whether a point will be influential is a 
joint function of being an outlier on y and on the set of predictors. In this case, the 
University of Michigan definitely doesn’t fit the model and it differs dramatically 
from the other psychology departments on the set of predictors. A check of the 
DFBETAS reveals that it is very different in terms of number of faculty (DFBETA 
--2.7653), and a scan of the raw data shows the number of faculty at 111, while the 
average number of faculty members for all the departments is only 29.5. The ques- 
tion needs to be raised as to whether the University of Michigan is “counting” fac- 
ulty members in a different way from the rest of the schools. For example, are they 
including part time and adjunct faculty, and if so, is the number of these quite 
large? 

For comparison purposes, the analysis was also run with the University of 
Michigan deleted. Interestingly, the same 4 predictors emerge from the stepwise 
procedure, although the results are better in some ways. For example, Mallows’ C, 
is now 4.5248, whereas for the full data set it was 5.216. Also, the PRESS residual 
sum of squares is now only 899.92, whereas for the full data set it was 1350.33. 


6.16 SAMPLE SIZE DETERMINATION FOR A RELIABLE 
PREDICTION EQUATION 


The reader may recall that in power analysis one is interested in determining a pri- 
ori how many subjects are needed per group to have, say, power = .80 at the .05 
level. Thus, planning is done ahead of time to ensure that one has a good chance of 
detecting an effect of a given magnitude. Now, in multiple regression the focus is 
different and the concern, or at least one very important concern, is development of 
a prediction equation that has generalizability. A study by Park and Dudycha 
(1974) provides several tables that, given certain input parameters, enable one to 
determine how many subjects will be needed for a reliable prediction equation. 


TABLE 6.10 
Regression Diagnostics (Standardized Residuals, Hat Elements, 
and Cook’s Distance) for Morrison MBA Data 


Placeholder for T0610 from p. 284 of previous edition 


(D These are the predicted values. 

© These аге the raw residuals, that is, 6; = yi — yi. Thus, for the first subject we have 
а —1—11156 = —1156. 

© These are the standardized residuals. 

@ Тһе hat elements—they have been called leverage elements elsewhere; hence the abbreviation 
LEVER. 
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They considered from 3 to 25 random variable predictors, and found that with 
about 15 subjects per predictor the amount of shrinkage is small (<.05) with high 
probability (.90), if the squared population multiple correlation (p2) is .50. In Table 
6.12 we present selected results from the Park and Dudycha study for 3, 4, 8, and 
15 predictors. 

To use Table 6.12 we need an estimate of р2, that is, the squared population 
multiple correlation. Unless an investigator has a good estimate from a previous 
study that used similar subjects and predictors, we feel taking p? = .50 is a reason- 
able guess for social science research. In the physical sciences, estimates >.75 are 
quite reasonable. If we set p? = .50 and want the loss in predictive power to be less 
than .05 with probability = .90, then the required sample sizes are as follows: 


Number of Predictors 


p?=.50 e= .05 3 4 8 15 
n 50 66 124 214 
n/k ratio 16.7 16.7 155 14.3 


The n/k ratios in all 4 cases are around 15/1. 

We had indicated earlier that generally about 15 subjects per predictor are 
needed for a reliable regression equation in the social sciences, that is, an equation 
that will cross-validate well. There are three converging lines of evidence that sup- 
port this conclusion: 


1. The Stein formula for estimated shrinkage (Table 6.8). 
2. My own experience. 
3. The results just presented from the Park and Dudycha study. 


However, the Park and Dudycha study (cf. Table 6.12) clearly shows that 
the magnitude of р (population multiple correlation) strongly affects how 
many subjects will be needed for a reliable regression equation. For example, 
if p? = .75, then for З predictors only 28 subjects are needed, whereas 50 sub- 
jects were needed for the same case when p? = .50. 

Also, from the Stein formula (Table 6.8), you will see if you plug in .40 for R2 
that more than 15 subjects per predictor will be needed to keep the shrinkage fairly 
small, while if you insert .70 for R2, significantly less than 15 will be needed. 


6.17 ANOVA AS A SPECIAL CASE OF REGRESSION 
ANALYSIS 


This section is presented to show that ANOVA is just a special case of regression 
analysis, i.e., the general linear model. Cohen's (1968) seminal article was primar- 


TABLE 6.11 
Regression Diagnostics (Standardized Residuals, Hat Elements, 
and Cook’s Distance) for National Academy of Science Data 
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ily responsible for bringing the general linear model to the attention of social sci- 
ence researchers. The regression approach to ANOVA is accomplished by dummy 
coding group membership. We will illustrate with two examples that were ana- 
lyzed in Chapter 2 with traditional ANOVA. The first example had 3 groups, with 
the following data: 


GROUP 1 GROUP 2 GROUP 3 
3 4 4 
6 7 5 
8 9 2 
8 3 
5 


We create two dummy variables (РОМ! and DUM2) to identify group member- 
ship, and use a | on the dummy variable to indicate group membership. The enti- 
ties in the third group are uniquely identified by 0 and 0 on the two dummy vari- 
ables, 1.е., not in groups 1 or 2. Thus, we һауе 


DEP ром! DUM2 DEP DUMI DUM2 

3 1 0 4 0 0 
6 1 0 5 0 0 
8 1 0 2 0 0 
4 0 1 3 0 0 
7 0 1 5 0 0 
9 0 1 

8 0 1 


The second example had four groups, with the following data: 


GROUP 1 GROUP2 GROUP3 GROUP4 
2 7 4 8 
3 9 4 4 
5 11 5 7 
6 8 7 
3 


In this case we need 3 dummy variables to identify group membership (DUMI, 
DUM2, and DUM): 
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2 1 0 0 
3 1 0 0 
5 1 0 0 
6 1 0 0 
7 0 1 0 
9 0 1 0 
11 0 1 0 
4 0 0 1 
4 0 0 1 
5 0 0 1 
8 0 0 1 
3 0 0 1 
8 0 0 0 
4 0 0 0 
7 0 0 0 
7 0 0 0 


Note, that again the subjects in the last group (4th group here) are identified by 
Os on all dummy variables, i.e., not in groups 1, 2, ог 3. In general, we need (k—/) 
dummy variables for k groups. 

When the above two data sets were run on SPSS or Windows 7.5 as regression 
analyses, predicting the dependent variable from group membership (РОМ! and 
DUM2 were the predictors in the first analysis and DUMI, DUM2, and DUM3 
were the predictors in the second analysis), the results were as follows: 


Placeholder for unnumbered tables p. 291 of previous edition 
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Note that the results are identical to what was obtained in Chapter 2. The mean 
square due to regression corresponds to mean square between, while the residual 
corresponds to mean square error. The mean square due to regression is just vari- 
ability due to group membership. We will see in the next chapter, on analysis of 
covariance (which combines ANOVA and regression analysis), that analysis of 
covariance can be done through regression analysis also. 


6.18 SUMMARY OF IMPORTANT POINTS 


1. A particularly good situation for multiple regression is where each of the 
predictors is correlated with y and the predictors have low intercorrelations, for 
then each of the predictors is accounting for a relatively distinct part of the variance 
on y. 

2. Moderate to high correlations among the predictors (multicollinearity) cre- 
ates three problems: it (a) severely limits the size of R, (b) makes determining the 
importance of given predictor difficult, and (c) increases the variance of regression 
coefficients, making for an unstable prediction equation. One way of combating 
this problem is to combine into a single measure a set of predictors that are highly 
correlated. 

3. Preselecting a small set of predictors by examining a correlation matrix 
from a large initial set, or by using one of the stepwise procedures (forward, step- 
wise, backward) to select a small set, is likely to produce an equation that is sample 
specific. If one insists on doing this, and I do not recommend it, then the onus is on 
the investigator to demonstrate that the equation has adequate predictive power be- 
yond the derivation sample. 

4. Mallows' C, was presented as a measure that minimizes the effect of 
underfitting (important predictors left out of the model) and overfitting (having 
predictors in the model that make essentially no contribution or are marginal). This 
will be the case if one chooses models for which C, « p. 

5. With many data sets, more than one model will provide a good fit to the 
data. Thus, one deals with selecting a model from a pool of candidate models. 

6. There are various graphical plots for assessing how well the model fits the 
assumptions underlying linear regression. One of the most useful graphs the stan- 
dardized residuals (у axis) versus the predicted values (x axis). If the assumptions 
are tenable, then one should observe roughly a random scattering. Any systematic 
clustering of the residuals indicates a model violation(s). 

7. Itis crucial to validate the model(s) by either randomly splitting the sample 
and cross-validating, or using the PRESS statistic, or by obtaining the Stein esti- 
mate of the average predictive power of the equation on other samples from the 
same population. Studies in the literature that have not cross-validated should be 
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checked with the Stein estimate to assess the generalizability of the prediction 
equation(s) presented. 

8. Results from the Park and Dudycha study indicate that the magnitude of the 
population multiple correlation strongly affects how many subjects will be needed 
for a reliable prediction equation. If your estimate of the squared population value 
is .50, then about 15 subjects per predictor are needed. Оп the other hand, if your 
estimate of the squared population value is substantially larger than .50, then far 
less than 15 subjects per predictor will be needed. Table 6.8 shows that if R? = .75, 
then 10 subjects per predictor will yield a reliable equation. If R2 = .85 (very 
strong) then five subjects per predictor is enough. 

9. Influential data points, that is, points that strongly affect the prediction 
equation, can be identified by seeing which cases have Cook distances >1. These 
points need to be examined very carefully. If such a point is due to a recording er- 
ror, then one would simply correct it and redo the analysis. Or if it is found that the 
influential point is due to an instrumentation error or that the process that gener- 
ated the data for that subject was different, then it is legitimate to drop the case 
from the analysis. If, however, none of these appears to be the case, then one should 
not drop the case, but perhaps report the results of several analyses: one analysis 
with all the data and an additional analysis(ses) with the influential point(s) de- 
leted. 

10. It was shown that analysis of variance can be considered as a special case 
of regression analysis by dummy coding group membership. 


EXERCISES 


1. Consider this set of data: 


x y 
2 3 
3 6 
4 8 
6 4 
7 10 
8 14 
9 8 
10 12 
11 14 
12 12 
13 16 


(а) Plot the data. Does there appear to be a linear relationship? 
(b) Run this data on SPSS, obtaining the case analysis. 
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(c) Do you see any pattern in the plot of the standardized residuals? What 
does this suggest? 

(d) Sketch in the regression line, and indicate the raw residuals by vertical 
lines. 


2. Consider the following small set of data: 


PREDX DEP 
0 1 
1 4 
2 6 
3 8 
4 9 
Э 10 
6 10 
7 8 
8 7 
9 6 
10 5 


(a) Plot the points. What type of relationship does this suggest? 

(b) Run this data on SPSS, forcing the predictor in and obtaining the case 
analysis. 

(c) Do you see any pattern in the plot of the standardized residuals? What 
does this suggest? 


3. Consider the following correlation matrix: 


y xı x2 
y 1.00 .60 50 
Х| .60 1.00 .80 
X2 50 180 1.00 


(a) How much variance on y will x; account for if entered first? 

(b) How much variance on y will x; account for if entered second? 

(c) What, if anything, do the above results have to do with the multicolli- 
nearity problem? 


4. А medical school admissions official has two proven predictors (хі and x2) 
of success in medical school. He has two other predictors under consider- 
ation (x3 and x4), of which he wishes to choose just one which will add the 
most (beyond what x; and x2 already predict) to predicting success. Below 
is the matrix of intercorrelations he has gathered on a sample of 100 medi- 
cal students: 
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хі x2 X3 ха 
y .60 55 .60 46 
ХІ 70 .60 20 
X2 .80 30 
x3 .60 


(a) What procedure would he use to determine which predictor has the 
greater incremental validity? Do not go into any numerical details, just in- 
dicate the general procedure. Also, what is your educated guess as to which 
predictor (x 3 or x4) will probably have the greater incremental validity. 


. Consider the following random sample (in the following table) of about 
50% from the Agresti data (in Appendix A in the back of the book) on 
home sales in Florida. We wish to predict PRICE from the other 4 variables 
as predictors. The other variables are NEW (whether the home was new or 
not), NOBATH (number of bathrooms), NOBED (number of bedrooms) 
and SIZE (size of the house). 

(a) Run stepwise regression analysis on this data. What model is selected? 
(b) Run backward elimination on this data. What model is selected? 


. An investigator has 15 variables on a file. Denote them by xl, x2, x3, . . ., x15. 
Assume there are spaces between all variables, so that free format can be 
used to read the data. The investigator wishes to predict x4. First, however, 
he obtains the correlation matrix among the predictors and finds that vari- 
ables 7 and 8 are highly correlated, and decides to combine those as a sin- 
gle predictor. He will also use variables 1, 3, 11, 12, 13, and 14 as predic- 
tors. Show the set of control lines for running a stepwise analysis and also 
obtaining a scatterplot of the residuals vs predicted values of y. 


. A different investigator has 8 variables on a file, with no spaces between 
the variables, so that fixed format will be needed to read the data. The data 
looks as follows: 


2534674823178659 
3645738234267583 
ETC. 


The first two variables are single digit integers, the next three variables are 
two digit integers, the next two variables are three digit integers and the 8th 
variable is a two digit integer. The 8th variable is the dependent variable. 
She wishes to force in variables 1 and 2, and then determine whether vari- 
ables 3 through 5 (as a block) have any incremental validity. Show the com- 
plete SPSS REGRESSION control lines for doing this analysis. 


о чох tn & шо F2 н 


— o 
— о 


33 


46 
Total N 


?Limited to first 100 cases. 
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NEW 
00 
00 
00 

1.00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 

1.00 
00 
00 
00 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 
00 

1.00 
00 

1.00 
00 
00 
00 
00 

1.00 

1.00 
00 
00 
00 
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Case Summaries* 


NOBATH 
1.00 
2.00 
3.00 
3.00 
1.00 
1.00 
1.00 
1.00 
1.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
2.00 
3.00 
2.00 
3.00 
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NOBED 
3.00 
3.00 
3.00 
4.00 
3.00 
3.00 
2.00 
3.00 
3.00 
2.00 
3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
2.00 
2.00 
3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
3.00 
4.00 
3.00 
3.00 
3.00 
3.00 
3.00 
4.00 
4.00 
4.00 
3.00 
4.00 
4.00 
3.00 
3.00 
4.00 
3.00 
5.00 
3.00 
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PRICE 
48.50 
55.00 
137.00 
309.40 
19.80 
24.50 
34.80 
32.00 
28.00 
49.90 
61.50 
68.90 
70.50 
72.90 
72.00 
71.00 
73.00 
70.00 
76.00 
75.50 
76.00 
81.80 
84.50 
86.90 
88.10 
89.50 
90.00 
95.50 
99.90 
102.30 
110.80 
97.90 
106.30 
106.50 
109.90 
110.00 
115.00 
114.90 
115.00 
117.90 
110.00 
128.00 
139.30 
142.00 
148.00 
150.00 
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SIZE 
1.10 
1.01 
2.40 
3.30 
1.28 

74 

78 

OT 

184 
1.08 
1.01 
1.29 
1,25 
1.28 
1.36 
1.20 
1.22 
1.40 
1.15 
1.62 
1.68 
1:33 
1.34 
1.58 
2.10 
1.34 
1:55 
1.54 
1.62 
1.42 
1.56 
2.00 
1.45 
1.65 
2.06 
1.76 
1.80 
1:57 
2.07 
1.99 
1.55 
1.88 
2.05 
2.12 
2.40 
2.04 
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8. А regression analysis was run on the Sesame St (п = 240) data set, predict- 
ing postbody from the following 5 pretest measures: prebody, prelet, pre- 
form, prenumb and prerelat. This was run in the syntax editor on SPSS for 
Windows 12.0. The control lines for doing a stepwise regression, obtaining 
the 10 largest values for the standardized residuals, the hat elements and 
Cook’s distance, and for obtaining a plot of the standardized residuals ver- 
sus the predicted y values are given below: 


TITLE ‘MULT REG ON POSTBODY-5 PREDICTORS’. 

DATA LIST FREE/ID SITE SEX AGE VIEWCAT SETTING VIEWENC 
PREBODY PRELET PREFORM 
PRENUMB PRERELAT PRECLASF POSTBODY POSTLET POSTFORM 
POSTNUMB POSTREL 

POSTCLAS PEABODY. 

BEGIN DATA. 


Ts 
г, 
Е: 
г, 


DATA LINES. 

END DATA. 

REGRESSION DESCRIPTIVES = DEFAULT/ 

VARIABLES = PREBODY TO PRERELAT POSTBODY/ 

STATISTICS = DEFAULTS TOL SELECTION/ 

DEPENDENT = POSTBODY/ 

METHOD = STEPWISE/ 
ESIDUALS = OUTLIERS (ZRESID, LEVER, COOK) / 

SCATTERPLOT (*RES,*PRE)/. 


The SPSS Windows 7.5 printout follows. Answer the following questions: 
(a) Why did PREBODY enter the prediction equation first? 

(b) Why did PREFORM enter the prediction equation second? 

(c) Write the prediction equation, rounding off to 3 decimals. 

(d) Is multicollinearity present? Explain. 

(e) Compute the Stein estimate and indicate in words exactly what it repre- 
sents. 

(f) Refer to the standardized residuals. Is the number of these greater than 
121 about what you would expect if the model is appropriate? Why, or why 
not? 

(g) Are there are outliers on the set of predictors? 

(h) Are there any influential data points? Explain. 

(i) From examination of the residual plot, does it appear there may be 
some model violation(s)? Why, or why not? 

(|) Are the values of VIF (variance inflation factor) for the predictors in the 
equation reasonable, according to Myers? 
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(k) Does the value of Mallows prediction criterion for model 2 seem rea- 
sonable? What about for model 1? 
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Placeholder for element from p. 303 of previous edition. 


28.5 pi wide 
22 pi deep 


9. Show how the partial correlation of .459 is obtained for COUEVAL in 
MODEL 1 under EXCLUDED VARIABLES for the MORRISON data. 


10. Run a stepwise regression analysis for the full AGRESTI data on the CD. 


11. Run backward selection on the full AGRESTI data. Do you get the same 
model? 


APPENDIX 
THE PRESS STATISTIC 


As pointed out by several authors, in many instances one does not have enough 
data to do a random split. One can obtain a good measure of the external predictive 
power by use of the PRESS statistic. In this approach the y value for each subject is 
set aside and a prediction equation is derived on the remaining data. Thus, n pre- 
diction equations are derived and n true prediction errors are found. To be very spe- 
cific, the prediction error for subject 1 is computed from the equation derived on 
the remaining (п- 1) data points, the prediction error for subject 2 is computed 
from the equation derived on the other (п — 1) data points, etc. As Myers (1990) put 
it, “PRESS is important in that one has information in the form of n validations in 
which the fitting sample for each is of size n — 1" (p. 171). 

The PRESS statistic is especially important when one does not have large sam- 
ple size, for in this case data splitting is really not practical. For example, if n = 60 
and we have 6 predictors, randomly splitting the sample involves obtaining a pre- 
diction equation on only 30 subjects. 

Recall that in deriving the prediction (via the least squares approach), the sum 
of the squared errors is minimized. The PRESS residuals, on the other hand, are 
true prediction errors, since the y value for each subject was not simultaneously 
used for fit and model assessment. Let us denote the predicted value for subject i, 
where that subject was not used in developing the prediction equation, by усо. 
Then the PRESS residual for each subject is given by 


ёр = у ўр 
and the PRESS sum of squared residuals is given by 
PRESS = х2? 


Therefore, one might prefer the model with the smallest PRESS value. The 
above PRESS value can be used to calculate an R?-like statistic that more accu- 
rately reflects the generalizability of the model. It is given by 


Връвс = 1 — (PRESS)/ (у; — у)? 


Importantly, the SAS REG program does routinely print out PRESS, although it 
is called PREDICTED RESID SS (PRESS). Given this value, it is a simple matter 
to calculate the R? PRESS statistic, since 82 = У (у;— y/n- 1). 
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Analysis of Covariance 


CONTENTS 

7.1 Introduction 

7.2 Purposes of Covariance 

7.3 Adjustment of Posttest Means 

7.4 Reduction of Error Variance 

7.5 Choice of Covariates 

7.6 Numerical Example 

7.7 Assumptions in Analysis of Covariance 
7.8 Use of ANCOVA with Intact Groups 

7.9 Computer Example for ANCOVA 

7.10 Alternative Analyses 

7.11 An Alternative to the Johnson-Neyman Technique 
7.12 Use of Several Covariates 

7.13 Computer Example with Two Covariates 
7.14 Summary 


7.1 INTRODUCTION 


In Chapter 4 we examined the effect of two or more independent variables (factors) 
in explaining variation on the dependent variable. We set up an experimental de- 
sign, and thus this method is called experimental control. In this chapter we con- 
sider explaining variation on the dependent variable by measuring the subjects on 
some other variable(s), called covariates, that are correlated with the dependent 
variable. Recall that the square of a correlation can be interpreted as “proportion of 
variance accounted for.” Thus, if we find that Т.О. is correlated with achievement 
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(dependent variable), say .60, we will be able to attribute 3696 of the within group 
variance on the dependent variable to variability on I.Q. In analysis of covariance 
(ANCOVA), this part of the variance is removed from the error term, and yields a 
more powerful test. This method of explaining variation is called statistical con- 
trol. We now consider an example to illustrate how ANCOVA can be very useful in 
an experimental study in which the subjects have been randomly assigned to 
groups. 


Example 


Suppose an investigator is comparing the effects of two treatments on achieve- 
ment in science. He assesses achievement through the use of a 50 item multiple 
choice test. He has 24 students and is able to randomly assign 12 of them to each 
of the treatments. I.Q. scores are also available for these subjects. The data are as 
follows: 


Treat. 1 Treat. 2 
10. Асћ. 10. Асћ. 
100 23 96 19 
113 31 108 26 
98 35 122 31 
110 28 103 22 
124 40 132 36 
135 42 120 38 
118 37 111 31 
93 29 93 25 
120 34 115 29 
127 45 125 41 
115 33 102 27 
104 25 107 21 
Means 113.08 935 111.17 28.83 


The investigator feels no need to use the I.Q. data for analysis purposes since 
the groups have been “equated” on all variables because of the random assignment. 
He therefore runs а 1 test for independent samples on achievement at the .05 level. 
He finds t = 1.676, which is not significant because the critical values are + 2.074. 

Because of small sample size we have a power problem. The estimated effect 
size is d = (33.5—28.83)/6.83 = .68 (cf. Section 3.2), which is undoubtedly of prac- 
tical significance since the groups differ by about two-thirds of a standard devia- 
tion. We have not detected it because of the power problem and because there is 
considerable within group variability on achievement. In fact, the pooled within 
correlation of I.Q. with achievement for the above data is about .80. This means 
that 64% of the variation in achievement test scores is associated with variation 
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(individual differences) on I.Q. An analysis of covariance removes that portion 
from the error term and yields a ¢ value significant at the .05 level (t = 2.25). 
Actually it comes out as an F statistic, so you need to take the square root. Recall 
that F = f? for two groups. After reading this chapter, the reader will be able to ver- 
ify the above f value by running the ANCOVA on SAS or SPSS. 

The above example showed that analysis of covariance is very useful in creating 
a more powerful test in an experimental study. ANCOVA is also used to reduce 
bias when comparing intact or self-selected groups, such as males and females, 
Head Start and non-Head Start. A classical use is adjusting posttest means on the 
dependent variable for any initial differences that may have been present on a pre- 
test. Another typical use is in teaching methods studies that use intact classrooms. 
If the average I.Q.’s for the classrooms differ by 10 points, then an adjustment of 
the posttest achievement is done. Although the use of analysis of covariance in this 
context may seem reasonable, it is quite controversial, which we discuss in detail 
in Section 7.8. 

The first 10 sections of this chapter cover the basics for ANCOVA with one 
covariate. We discuss the purposes of covariance, the underlying concepts, the as- 
sumptions, interpretation of results, the relationship of ANOVA and ANCOVA, 
and the running of ANCOVA on SAS and SPSS. The last five sections are more ad- 
vanced, especially the section on the Johnson-Neyman technique, and may be 
skipped without loss of continuity. Much has been written about analysis of 
covariance, and the reader should at least be aware of two classic review articles by 
Cochran (1957) and Elashoff (1969), and a very comprehensive and thorough 
book on covariance and alternatives by Huitema (1980). 


7.2 PURPOSES OF COVARIANCE 


Analysis of covariance is related to the following two basic objectives in experi- 
mental design: 


1. Elimination of systematic bias. 
2. Reduction of within group or error variance. 


Systematic bias means that the groups differ systematically on some key vari- 
able(s) that are related to performance on the dependent variable. If the groups in- 
volve treatments, then a significant difference on a posttest at the end of treatments 
will be confounded (mixed in with) with initial differences on a key variable. It 
would not be clear whether the treatments were making the difference, or whether 
initial differences simply transferred to posttest means. A simple example is a 
teaching methods study with initial differences between groups on I.Q. Suppose 
two methods of teaching algebra are compared (same teacher for both methods) 
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with two classrooms in the same school. The following summary data, means for 
the groups, are available: 


Method 1 Method 2 
1.Q. 120.2 105.8 
Posttest 73.4 67.5 


If the t test for independent samples on the posttest is significant, then it isn't 
clear whether it was method 1 that made the difference, or the fact that the children 
in that class were "brighter" to begin with, and thus would be expected to achieve 
higher scores. 

As another example, suppose we are comparing the effect of four stress situa- 
tions on blood pressure (the dependent variable). It is found that situation 3 is sig- 
nificantly more stressful than the other three situations. However, we note that the 
blood pressure of the subjects in group 3 under minimal stress is greater than for 
the subjects in the other groups. Then, it isn't clear that situation 3 is necessarily 
most stressful. We need to determine whether the blood pressure for group 3 would 
still be higher if the posttest means for all 4 groups were “adjusted” in some way to 
account for initial differences in blood pressure. We see later that the posttest 
means are adjusted in a linear fashion to what they would be if all groups started 
out equally on the covariate, that is, at the grand mean. 

The best way of dealing with systematic bias is to randomly assign subjects to 
groups. Then we can be confident, within sampling error, that the groups don’t dif- 
fer systematically on any variables. Of course, in many studies random assign- 
ment is not possible, so we look for ways of at least partially equating groups. One 
way of partially controlling for initial differences is to match on key variables. Of 
course, then we can only be sure the groups are equivalent on those matched vari- 
ables. Analysis of covariance is a statistical way of controlling on key variables. 
Once again, as with matching, ANCOVA can only reduce bias, and not eliminate 
it. 

Why is reduction of error variance, the second purpose of analysis of 
covariance, important? Recall from Chapter 2 on one way ANOVA that the F 
statistic was F = MS,/MS,,, where MSw was the estimate of error. If we can make 
MSw smaller, then F will be larger and we will obtain a more sensitive or power- 
ful test. And from Chapter 3 on power, remember that power is generally poor in 
small or medium sample size studies. Thus the use of perhaps 2 or 3 covariates 
in such studies should definitely be considered. The use of covariates that have 
relatively low correlations with each other are particularly helpful because each 
covariate removes a somewhat different part of the error variance from the de- 
pendent variable. 

Analysis of covariance is a statistical way of reducing error variance. There are 
several other ways of reducing error variance. One way is through sample selec- 
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tion; subjects who are more homogeneous vary less on the dependent measure. 
Another way, discussed in Chapter 4 on factorial designs, was to block on a vari- 
able, or consider it as another factor in the design. 


7.3 ADJUSTMENT OF POSTTEST MEANS 


As mentioned earlier, analysis of covariance adjusts the posttest means to what 
they would be if all groups started out equally on the covariate; at the grand mean. 
In this section we derive the general equation for linearly adjusting the posttest 
means for one covariate. Before we do that, however, it is important to discuss one 
of the assumptions underlying the analysis of covariance. That assumption for one 
covariate requires equal population regression slopes for all groups. Consider a 
three group situation, with 15 subjects per group. Suppose that the scatterplots for 
the 3 groups looked as given below. 


Group 1 Group 2 Group 3 


Recall from beginning statistics that the x and y scores for each subject deter- 
mine a point in the plane. Requiring that the slopes be equal is equivalent to saying 
that the nature of the linear relationship is the same for all groups, or that the rate of 
change in y as a function of x is the same for all groups. For the above scatterplots 
the slopes are different, with the slope being the largest for group 2 and smallest for 
group 3. But the issue is whether the population slopes are different, and whether 
the sample slopes differ sufficiently to conclude that the population values are dif- 
ferent. With small sample sizes as in the above scatterplots, it is dangerous to rely 
on visual inspection to determine whether the population values are equal, because 
of considerable sampling error. Fortunately there is a statistic for this, and later we 
indicate how to obtain it on SPSS and SAS. In deriving the equation for the ad- 
justed means we are going to assume the slopes are equal. What if the slopes are 
not equal? Then ANCOVA is not appropriate, and we indicate alternatives later on 
in the chapter. 
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Regression line 


Xi x 
Slope of Straight Line => = шарен 
change in x 
p-3 3 
X — Xj 


b(x — x) = yf — y, 


yi = у; + D(x — xi) 
у = У; —D(Xi —X) 


FIGURE 7.1 Deriving the General Equation for Adjusted Means in Covariance 


The details of obtaining the adjusted mean for the ith group (i.e., any group) are 
given in Figure 7.1. The general equation follows from the definition for the slope 
of a straight line and some basic algebra. 

In Figure 7.2 we show the adjusted means geometrically for a hypothetical 3 
group data set. A positive correlation is assumed between the covariate and the de- 
pendent variable, so that a higher mean on x implies a higher mean on y. Note that 
since group 1 scored below the grand mean on the covariate, its mean is adjusted 
upward. On the other hand, since the mean for group 3 on the covariate is above the 
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80 : 
Group 1 
70 Group 2 
(34,65) 
60 
50 x 
30 36 40 . 50 60 
Group 1 Group 2 Group 3 
Xi 32 34 42 
vi 70 65 62 
y 72 66 59 


A common slope = .5 is being assumed here. 


yt = 70—.5(32 —36), у; —65—.5(34 — 36), у; = 62 —.5(42 — 36) 


(D The arrows on the regression lines indicate that the means are adjusted linearly upward or down- 
ward to what they would be if the groups had started out at the grand mean on the covariate. 


FIGURE 7.2 Means and Adjusted Means for Hypothetical Three Group Data Set 


grand mean, covariance estimates that it would have scored lower on y if its mean 
on the covariate was lower (at grand mean), and therefore the mean for group 3 is 
adjusted downward. 
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7.4 REDUCTION OF ERROR VARIANCE 


It is relatively simple to derive the approximate error term for covariance. Denote 
the correlation between the covariate (x) and the dependent variable (у) by rxy. Тһе 
square of a correlation can be interpreted as “proportion of variance accounted 
for.’ The within group variance for ANOVA is MS. Thus, the part of the within 
group variance on y that is accounted for by the covariate is r»? MSw. The within 
variability left, after the portion due to the covariate is removed, is 


MS, — М8,ғ, = М8,(1-ғ,) (1) 


and this becomes our new error term for the analysis of covariance, which we de- 
note by MSw*. Technically, there is an additional part to the adjusted error term: 


MSh = MS$,Q – ти 1/6 —2)] 


where fe is the error degrees of freedom. However, the effect of this additional fac- 
tor is slight as long as N > 50. 

To show how much of a difference a covariate can make in increasing the sensi- 
tivity of an experiment, we consider a hypothetical study. An investigator runs a 
one-way ANOVA (3 groups and 20 subjects per group), and obtains F = 200/100 = 
2, which is not significant, because the critical value at .05 is 3.18. He pretested the 
subjects, but didn't use the pretest as a covariate (even though the correlation be- 
tween covariate and posttest was .71) because the groups didn't differ significantly 
on the pretest. This is a common mistake made by some researchers who are un- 
aware of the other purpose of covariance, that of reducing error variance. The anal- 
ysis is redone by another investigator using ANCOVA. Using the equation we just 
derived she finds 


MS% ~ 100[1 —(.71)?] = 50 


Thus, the error term for the ANCOVA is only half as large as the error term for 
ANOVA. It is also necessary to obtain a new MSp* for ANCOVA, call it М8% In 
Section 7.6 we show how to calculate М5,*. Let us assume here that the investiga- 
tor obtains the following F ratio for the covariance analysis: 


F* = 190/50 = 3.8 


This is significant at the .05 level. Therefore, the use of covariance can make the 
difference between finding and not finding significance. Finally, we wish to note 
that MS,* can be smaller or larger than MS, although in a randomized study the ex- 
pected values of the two are equal. 
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7.5 CHOICE OF COVARIATES 


In general, any variables that theoretically should correlate with the dependent 
variable, or variables that have been shown to correlate on similar types of sub- 
jects, should be considered as possible covariates. The ideal is to choose as 
covariates variables that of course are significantly correlated with the dependent 
variable and have low correlations among themselves. If two covariates are highly 
correlated (say .80), then they are removing much of the same error variance from 
у; x2 will not have much incremental validity. On the other hand, if two covariates 
(хі and x2) have a low correlation (say .20), then they are removing relatively dis- 
tinct pieces of the error variance from y, and we will obtain a much greater total er- 
ror reduction. This is illustrated graphically below using Venn diagrams, where the 
circle represents error variance on y. 


x, and x, Low correl. х, and x, High correl. 


Solid lines—part of 
variance on y that x, 
accounts for. 


‘Dashed lines— part of 
variance on y that x, 
accounts for. 


The shaded portion in each case represents the incremental validity of x2, that is, 
the part of error variance on y it removes that x; did not. 

Huitema (1980, p. 161) has recommended limiting the number of covariates to 
the extent that the ratio 


Пн DI 10 
N 


where C is the number of covariates, J is the number of groups, and N is total sam- 
ple size. Thus, if we had a four group problem with a total of 80 subjects, then (C + 
3)/80 « .10 or C « 5. Less than 5 covariate should be used. If the above ratio is > 
.10, then the adjusted means are likely to be unstable. 


7.6 NUMERICAL EXAMPLE 


We now consider an example to illustrate how to calculate an ANCOVA and to 
make clear what the null hypothesis is that is being tested. We use the following 3 
group data set from Myers (1979, p. 417), where x indicates the covariate: 
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Group 1 Group 2 Group 3 
x y х y x y 
12 26 11 32 6 23 
10 22 12 31 13 35 
7 20 6 20 15 44 
14 34 18 41 15 41 
12 28 10 29 7 28 
11 26 11 31 9 30 
Recall that in the one way ANOVA the null hypothesis was Но: ш =И2 == Ц 
(population means are equal). But in analysis of covariance we are adjusting the 
means (Section 7.3), so that the null hypothesis becomes Но: =н = =u}, that 


is, the adjusted population means are equal. In the above example, the specific null 
hypothesis is Ho: 4; =u} =. In ANCOVA we adjust sums of squares correspond- 
ing to the sums of squares total, within and between from ANOVA. We denote these 
adjusted sums of squares by SS7 „55 „апа SS; respectively. SS% is obtained by sub- 
tracting SS% from 557. 

An ANOVA on the above Myers data, as the reader should check, yields a 
within cells sum of squares of 666.83 and a group sum of squares of 172.11. We 
will need these results in obtaining the ANCOVA. Recall that 55, from ANOVA 
measures variability of the subjects scores about the grand mean; 


SS: > (xij — x 


Let гуу denote the correlation between the covariate and the dependent variable 
for all the scores, disregarding group membership. Remember that r»? can be in- 
terpreted as proportion of variance accounted for. Thus, ry? SS; represents the 
amount of variability on y that is accounted for by its relationship with the 
covariate. Therefore, the remaining variability on y, or the adjusted total sum of 
squares, is given by 


SS; = (0—r$)SS, (2) 


Now consider the pooled within correlation for x and y, that is, where group 
membership is taken into account. Although not strictly true, this correlation can 
be thought of as the average (or weighted average for unequal group sizes) of the 
correlations within the groups. Denote this correlation by муз). Then the amount 
of within group variability on y accounted for by the covariate is given by 
Ру). Therefore, the remaining within variability on у, ог ће adjusted within 
sum of squares, is given by 


55, — (1— rw) )SSw (3) 


Finally, the adjusted between sum of squares is obtained as the difference be- 
tween the adjusted total and adjusted within: 
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SS; = 55 — SS, (4) 
The F ratio for analysis of covariance is then given by 
Е“ = (SS; /(k —1))/ SS% (М-К-С)- MS; / MS% (5) 


where C is the number of covariates. Note that one degree of freedom for error is 
lost for each covariate used. 

This method of computing the ANCOVA is conceptually fairly simple, and im- 
portantly shows its direct linkage with the results from an ANOVA on the same 
data. The SAS GLM control lines for running the ANCOVA are presented in Table 
7.1, along with selected printout. The total correlation is .85286 and the within 
group correlations are gp 1: .9316, gp 2: .9799, and gp 3: .9708. Using these re- 
sults, the F ratio for the ANCOVA is easily obtained. First, from Equation 2 we 
have that 


SS; (1— (.85286)2 )838.94 = 228.72 


TABLE 7.1 
SAS GLM Control Lines and Selected Printout for ANCOVA on Myers 
Data and SPSS Windows 12.0 Interactive Plots and Regression Lines 


Placeholder for TO701 from p. 315 of previous edition 
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TABLE 7.1 
(Continued) 


Place holder for T0701-b from p. 316 of previous edition. 


Now, using the average of the within correlations as a rough estimate of the 
pooled within correlation, we find that 7 = (.9316 + .9799 + .9708)/3 = .9608 (the 
actual pooled correlation is .965). Now, using Equation 3 we find the adjusted 
within sum of squares: 

SS% —(1—(.965)? )(666.83) = 45.86 


Therefore, the adjusted between sum of squares is: 
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SS; — 228.72 — 45.86 = 182.86 


and the F ratio for the analysis of covariance is 


F* = (182.86/2)/45.86 /(18 — 3 — 1) = 27.87 


ANCOVA as a Special Case of Multiple Regression 


Since analysis of covariance involves both analysis of variance and regression 
analysis, we can do an ANCOVA using multiple regression. Recall that in the last 
chapter on regression analysis we showed that ANOVA was a special case of re- 
gression analysis. We dummy coded group membership and used these dummy 
variables to predict the dependent variable. 

We will illustrate how an ANCOVA can be done using multiple regression with 
the Myers data. First, we shall check the homogeneity of regression slopes as- 
sumption. For a one way design, as Myers and Well (1991, p. 567) point out, “Per- 
forming an ANCOVA on a design that has a single factor A can now be seen as de- 
termining whether A has effects over and above those of the covariate x. “Thus, we 
force the covariate in and then determine whether group membership has an effect 
above and beyond the covariate. Since we have 3 groups here, we will need two 
dummy variables to code group membership (we denote them by DUMI, DUM2). 
Recall that a violation of the slopes assumption meant there was a group by 
covariate interaction. Thus, we set up an interaction effect and test it for signifi- 
cance. We create the group by covariate interaction effects by multiplying (we de- 
note them by COVDUMI and COVDUM2) and then test these for significance. 
The complete control lines for testing homogeneity of slopes and doing the 
ANCOVA are presented in Table 7.2. 

Selected printout from SPSS for Windows 12.0 is presented in Table 7.3. Note 
that the assumption of equal regression slopes is tenable (F = .354), and that the 
ANCOVA is significant (F = 27.886). 


7.7 ASSUMPTIONS IN ANALYSIS OF COVARIANCE 
Analysis of covariance rests on the same assumptions as the analysis of variance 
plus three additional assumptions regarding the regression part of the covariance 


analysis. ANCOVA also assumes 


1. A linear relationship between the dependent variable and the covariate(s). 


0 


TT 0 


0 


о 00056 
L3 TE ТТ 
IL 0 Т 9€ LI 


0 


OT 0 


0 


0 008€ L 
Т 0 6c OT 
cL O т 8€ CT 


"тепбо зле ѕивәш uorve[ndod pojsnfpe оф IPAM *vAOONY ш srsaujodÁq шеш оф SUNSA 51 SYL © 
1893 sedo[s Jo Апоџовошоц оф sp[er& qorqA зиәшәје] eu 51 SIT, (7) 


0 


8T 0 


0 


0 00 Iv SI 
T 0 TV 8T 
VT 0 T VE VI 


“/ (сипа типа) LSaL/a 


/алпхуяяа = 


*/(@WNGAOD типалоә) 


/TWNGAOD OL HVAOO 
/1'Inva4sud 


| 


/ача 
(Апа OL HVAOO 
SHAILdIHOSud 
15я1/сипа типа 4 
/dad 


| 


SSHAILdIHOSud 


0 0 SE €T 0 0 
Т0 ТЕ ет TT 0 
0 T ce OT 0 б 


МЛОО MWHLNH 
LNSONSdHd 
SH'IGVIHVA 
NOISSsSHDHH 

VAOD WuLNd 
мама на 
SW'IHVIHVA 
NOISSWHOHH 

"LSI'I 
"ама ANA 
00 609 
L0 ЕСЕ 

T OT 9€ £I 
"ома мтона 


“аипалоо типалоо сипа типа ача HVAOO/HHH4 LSIT мама 
",VAOONV-VILLVG SUSAN NO OWN "LION: 


S'LLIO 


O'ZL зморшм JO} 5545$ бизп егеп SIÁN uo VAOONV Due sa sados jo Лиеџебошон лој ej pueuiuo2 хезиле 
cZzulavL 


298 


00° 
00° 
00 
00 
00 
00 
00711 
0001 
00%81 
009 
0071 
00711 
00' 
00' 
00' 
00' 
00' 
00' 
CWOGAOO 


00° 
00° 
00 
00 
00 
00 
00 
00 
00 
00° 
00° 
00 
0011 
0071 
0071 
007/. 
0001 
0071 
типалоо 


00' 
00' 
00' 
00' 
00' 
00' 
0071 
0071 
0071 
0071 
0071 
0071 
00' 
00' 
00' 
00' 
00' 
00' 
сипа 


00' 
00' 
00' 
00' 
00' 
00' 
00' 
00' 
00' 
00' 
00' 
00' 
0071 
0071 
0071 
0071 
001 
001 
типа 


0070< 
0078С 
0017 
00%7 
0075 
00'€c 
00'I€ 
00'6c 
0017 
00'0c 
00715 
007 
00°97 
00'8c 
oore 
00'0c 
00'cc 
0079c 
ача 


00`6 
007. 
00751 
00<1 
007<1 
009 
00711 
00'01 
00'8I 
0079 
00701 
0011 
00711 
00701 
0071 
0072. 
0001 
0071 
HVAOO 


299 


300 CHAPTER7 


2. Homogeneity of the regression slopes (for one covariate); parallelism of 
the regression planes for two covariates and for more than 2 covariates ho- 
mogeneity of the regression hyperplanes. 

3. The covariate is measured without error. 


Since covariance rests on the same assumptions as ANOVA, any violations that 
are serious in ANOVA (like dependent observations) are also serious in ANCOVA. 
Violation of all 3 of the above regression assumptions can also be serious. For ex- 
ample, if the relationship between the covariate and the dependent variable is 
curvilinear, then the adjustment of the means will be improper. 

There is always measurement error for the variables that are typically used as 
covariates in social science research. In randomized designs this reduces the power 
of the ANCOVA, but treatment effects are not biased. For non-randomized designs 
the treatment effects can be seriously biased. 

A violation of the homogeneity of regression slopes can also yield quite mis- 
leading results. To illustrate this, we present in Figure 7.3 the situation where the 
assumption is met and two situations where the slopes are unequal. Notice that 
with equal slopes the estimated superiority of group 1 at the grand mean is a totally 
accurate estimate of group 175 superiority for all levels of the covariate, since the 
lines are parallel. For Case 1 of unequal slopes there is a covariate by treatment in- 
teraction. 'That is, how much better group 1 is depends on which value of the 
covariate we specify. This is analogous to the concept of interaction in a factorial 
design. For Case 2 of heterogeneous slopes the use of covariance would be totally 
misleading. Covariance estimates no difference between the groups, while for x = 
c, group 2 is quite superior, and for x = d, group | is quite superior. Later in the 
chapter we show how to test the assumption of equal slopes on SPSS and on SAS. 

Therefore, in examining printout from the statistical packages it is impor- 
tant to first make two checks to determine whether analysis of covariance is 
appropriate: 


1. Check to see whether there is a linear relationship between the de- 
pendent variable and the covariate. 

2. Check to determine whether the homogeneity of the regression slopes 
is tenable. 


If the above assumptions are met, then there is not any debate about the appro- 
priateness of ANCOVA in randomized studies in which the subjects have been ran- 
domly assigned to groups. For intact groups, there is a debate, and we discuss that 
in the next section. 

If either of the above assumptions is not satisfied, then covariance is not ap- 
propriate. In particular, if (2) is not met, then one should consider using the 


TABLE 7.3 
Selected Printout From SPSS for Windows 12.0 for ANCOVA on Myers 
Data Using Multiple Regression 


© This is the test for a significant regression on y on the covariate. 
© This is the test for homogeneity of the regression slopes. 
© This is the main test in covariance; whether the adjusted population means are equal. 
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Johnson—Neyman (1936) technique. For extended discussion on the John- 
son—Neyman technique see Rogosa (1977, 1980). 


7.8 USE OF ANCOVA WITH INTACT GROUPS 


It should be noted that some researchers (Anderson, 1963; Lord, 1969) have ar- 
gued strongly against using analysis of covariance with intact groups. Although 
we do not take this position, it is important that the reader be aware of the several 
limitations and/or possible dangers when using ANCOVA with intact groups. 
First, even the use of several covariates will not equate intact groups, and one 
should never be deluded into thinking it can. The groups may still differ on some 
unknown important variable(s). Also, note that equating groups on one variable 
may result in accentuating their differences on other variables. 

Second, recall that ANCOVA adjusts the posttest means to what they would be 
if all the groups had started out equal on the covariate(s). You then need to consider 
whether groups that are equal on the covariate would ever exist in the real world. 
Elashoff (1969) gives the following example. Teaching methods A and B are being 
compared. The class using A is composed of high ability students, whereas the 
class using B is composed of low ability students. A covariance analysis can be 
done on the posttest achievement scores holding ability constant, as if A and B had 
been used on classes of equal and average ability. But, as Elashoff notes, “It may 
make no sense to think about comparing methods A and B for students of average 
ability, perhaps each has been designed specifically for the ability level it was used 
with, or neither method will, in the future, be used for students of average ability” 
(p. 387). 

Third, the assumptions of linearity and homogeneity of regression slopes need 
to be satisfied for ANCOVA to be appropriate. 

A fourth issue that can confound the interpretation of results is differential 
growth of subjects in intact or self selected groups on some dependent variable. If 
the natural growth is much greater in one group (treatment) than for the control 
group and covariance finds a significance difference, after adjusting for any pretest 
differences, then it isn’t clear whether the difference is due to treatment, differen- 
tial growth, or part of each. Bryk and Weisberg (1977) discuss this issue in detail 
and propose an alternative approach for such growth models. 

A fifth problem is that of measurement error. Of course this same problem is 
present in randomized studies. But there the effect is merely to attenuate power. In 
non-randomized studies measurement error can seriously bias the treatment effect. 
Reichardt (1979), in an extended discussion on measurement error in ANCOVA, 
states, 
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Measurement error in the pretest can therefore produce spurious treatment effects 
when none exist. But it can also result in a finding of no intercept difference when a 
true treatment effect exists, or it can produce an estimate of the treatment effect 
which is in the opposite direction of the true effect. (p. 164) 


It is no wonder then that Pedhadzur (1982, p. 524), in discussing the effect of mea- 
surement error when comparing intact groups, says, 


The purpose of the discussion here was only to alert you to the problem in the hope 
that you will reach two obvious conclusions: (1) that efforts should be directed to 
construct measures of the covariates that have very high reliabilities and (2) that ig- 
noring the problem, as is unfortunately done in most applications of ANCOVA, will 
not make it disappear. (p. 524) 


Porter (1967) has developed a procedure to correct ANCOVA for measurement 
error, and an example illustrating that procedure is given in Huitema (1980, pp. 
315-316). This is beyond the scope of the present text. 

Given all of the above problems, the reader may well wonder whether we 
should abandon the use of covariance when comparing intact groups. But other 
statistical methods for analyzing this kind of data (such as matched samples, gain 
score ANOVA) suffer from many of the same problems, such as seriously biased 
treatment effects. The fact is that inferring cause-effect from intact groups is 
treacherous, regardless of the type of statistical analysis. Therefore, the task is to 
do the best we can and exercise considerable caution, or as Pedhazur (1982) put it: 
“But the conduct of such research, indeed all scientific research, requires sound 
theoretical thinking, constant vigilance, and a thorough understanding of the po- 
tential and limitations of the methods being used” (p. 525). 


7.9 COMPUTER EXAMPLE FOR ANCOVA 


To illustrate how to run an ANCOVA, while at the same time checking the critical 
assumptions of linearity and homogeneity of slopes, we consider part of a Sesame 
Street data set from Glasnapp and Poggio (1985), who present data on many vari- 
ables, including 12 background variables and 8 achievement variables for 240 sub- 
jects. Sesame Street was developed as a television series aimed mainly at teaching 
preschool skills to 3- to 5-year-old children. Data was collected at 5 different sites 
on many achievement variables both before (pretest) and after (posttest) viewing 
of the series. We consider here only the achievement variable of knowledge of 
numbers. The maximum possible score is 54 and the content of the items included 
recognizing numbers, naming numbers, counting, addition, and subtraction. We 
use ANCOVA to determine whether the posttest knowledge of numbers for the 


ANALYSIS OF COVARIANCE 305 


TABLE 7.4 
SPSS MANOVA Control Lines for Analysis of Covariance on Sesame 
Street Data 


TITLE ‘ANALYSIS OF COVARIANCE ON SESAME DATA’. 
DATA LIST FREE/SITE PRENUMB POSTNUMB. 
BEGIN DATA. 


DATA (ON CD) 
END DATA. 
MANOVA PRENUMB POSTNUMB BY SITE(1,3) / 
NALYSIS POSTNUMB WITH PRENUMB / 

RINT = PMEANS / 


POSTNUMB / 
ENUMB, SITE, PRENUMB BY SITE / 
PRENUMB/ . 


© о өө 


H 
Q 
pz 
| 
oto I 
р] 
Ін 
2 


© The covariate(s) follow the keyword WITH. 

© This PRINT subcommand is needed to obtain the adjusted means, which is what we are testing 
for significance. 

G This ANALYSIS subcommand and the following DESIGN subcommand are needed to test the 
homogeneity of the regression slopes assumption. 

@ This ANALYSIS subcommand is used to test whether the sites differed significantly on the pre- 
test. 


children at the first 3 sites differed after adjustments are made for any pretest dif- 
ferences. 

In Table 7.4 we give the complete control lines for running the ANCOVA on 
SPSS MANOVA. Table 7.5 gives selected annotated output from that run. We indi- 
cate which of the F tests are checking the assumptions of linearity and homogene- 
ity of slopes, and which F addresses the main question in covariance (whether the 
adjusted population means are equal). 


7.10 ALTERNATIVE ANALYSES 


When comparing two or more groups with pretest and posttest data, the following 
other modes of analysis have been used by many researchers: 


1. An ANOVA is done on the difference or gain scores (posttest—pretest). 

2. A two way repeated measures (this is covered in Chapter 5) ANOVA is 
done. This is also called a one between (the grouping variable) and one 
within (pretest-posttest part) factor ANOVA. 


TABLE 7.5 
Selected Printout from SPSS MANOVA for ANCOVA on Sesame Street 
Data 


Placeholder for Т0705 from p. 236 of the previous edition. 


(D This indicates there is a significant correlation between the dependent variable and the 
covariate(PRENUMB), or equivalently a significant regression of POSTNUMB on PRENUMB. 

® This test indicates that homogeneity of regression slopes is tenable at the .05 level, since the p 
value is .607. 

© This F is testing the main result in ANCOVA; whether the adjusted population means are equal. 
This is rejected at the .05 level, indicating SITE differences. 

® These are the adjusted means. Since the estimated common regression slope is .686 (given on the 
printout but not presented here), the adjusted mean for SITE 1 is 


yi = 30.083—686(22.4 — 2167) = 2958 


© This test indicates the subjects at the 3 sites differ significantly on the pretest, i.e., on 
PRENUMB. 
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Huck and McLean (1975) and Jennings (1988) have compared the above two 
modes of analysis along with the use of ANCOVA for the pretest-posttest control 
group design, and conclude that ANCOVA is the preferred method of analysis. 
Several comments from the Huck and McLean article are worth mentioning. First, 
they note that with the repeated measures approach it is the interaction F that is in- 
dicating whether the treatments had a differential effect, and not the treatment 
main effect. We consider two patterns of means below to illustrate. 


Situation 1 Situation 2 
Pretest Posttest Pretest Posttest 
Treat. 70 80 Treat 65 80 
Control 60 70 Control 60 68 


In situation | the treatment main effect would probably be significant, because 
there is a difference of 10 in the row means. However, the difference of 10 on the 
posttest just transferred from an initial difference of 10 on the pretest. There is not 
a differential change in the treatment and control groups here. On the other hand, in 
situation 2 even though the treatment group scored higher on the pretest, it in- 
creased 15 points from pre to post while the control group increased just 8 points. 
That is, there was a differential change in performance in the two groups. But, re- 
call from Chapter 4 that one way of thinking of an interaction effect is as a “differ- 
ence in the differences.” This is exactly what we have in situation 2, hence a signifi- 
cant interaction effect. 

Second, Huck and McLean (1975) note that the interaction F from the repeated 
measures ANOVA is identical to the F ratio one would obtain from an ANOVA on 
the gain (difference) scores. Finally, whenever the regression coefficient is not 
equal to 1 (generally the case), the error term for ANCOVA will be smaller than for 
the gain score analysis and hence the ANCOVA will be a more sensitive or power- 
ful analysis. 

Although not discussed in the Huck and McLean paper, we would like to add a 
measurement caution against the use of gain scores. It is a fairly well known mea- 
surement fact that the reliability of gain (difference) scores is generally not good. 
To be more specific, as the correlation between the pretest and posttest scores ap- 
proaches the reliability of the test, the reliability of the difference scores goes to 0. 
The following table from Thorndike and Hagen (1977) quantifies things: 
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Average reliability of two tests 


Correlation between tests 50 .60 70 .80 .90 95 
00 50 .60 70 .80 .90 .95 
40 17 33 .50 67 83 92 
50 00 20 40 60 80 90 
60 00 25 50 75 88 
70 00 33 67 83 
80 00 50 75 
90 .00 50 
95 .00 


If our dependent variable is some noncognitive measure, or a variable derived 
from a nonstandardized test (which could well be of questionable reliability), then 
areliability of about .60 or so is a definite possibility. In this case, if the correlation 
between pretest and posttest is .50 (a realistic possibility), the reliability of the dif- 
ference scores is only .20! On the other hand, the above table also shows that if our 
measure is quite reliable (say .90), then the difference scores will be reliable for 
moderate pre-post correlations. For example, for reliability = .90 and pre-post cor- 
relation = .50, the reliability of the differences scores is .80. 


7.11 AN ALTERNATIVE TO THE JOHNSON-NEYMAN 
TECHNIQUE 


We consider hypothetical data from Huitema (1980, p. 272). The effects of two 
types of therapy are being compared on an aggressiveness score. The covariate (x) 
are scores on a sociability scale. Since the Johnson-Neyman technique is still (18 
years after the first edition of this text) not available on SAS or SPSS, we consider 
an alternative analysis that does shed some light. Recall that a violation of the ho- 
mogeneity of regression slopes assumption meant there was a covariate by group 
interaction. Thus, one way of investigating the nature of this interaction would be 
to set up a factorial design, with groups being one of the factors and two or more 
levels for the covariate (other factor), and run a regular two way ANOVA. This pro- 
cedure is not as desirable as the Johnson-Neyman technique for two reasons: (1) 
the Johnson-Neyman technique is more powerful, and (2) the Johnson—Neyman 
technique enables us to determine where the group differences are for all levels of 
the covariate, whereas the factorial approach can only check for differences for the 
levels of the covariate included in the design. Nevertheless, at least most research- 
ers can easily do a factorial design, and this does yield useful information. 

For the Huitema data, although there is a strong linear relationship in each 
group, the assumption of equality of slopes is not tenable (Figure 7.4 shows why). 
Therefore, covariance is not appropriate, and we split the subjects into three levels 


14 Therapy 1 


1 2 3 4 5 6 7 8 9 10 11 
x 
N= 15 REGRESSION LINE RES. MS MEAN S.D. 
А 2.776 y = 9.8568 + .20859 + x .28921 x 5.8000 3.0519 
P «.001 bo! 6,901 у 11.067 ‚82086 
Тћегару 2 


1 г 3 4 5 6 7 8 9 10 11 


x 
N= 15 REGRESSION LINE RES. MS MEAN S. D. 
А -.977 у = 3.1554 + 1.2370 + x .55079 x 5.5333 2.6623 
Р <.001 Бобр2 Ьу922 | y 10.000 3.3700 


FIGURE 7.4 Scatterplots and Summary Statistics for Each Therapy Group 
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TABLE 7.6 
Results From SPSS for Windows 12.0 
for 2 x 3 Factorial Design on Huitema Data 


Placeholder for T0706 from p. 331 of previous edition 
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for sociability: low (1-4), medium (4.5-7.5) and high (8-11), and set up the fol- 
lowing 2 x 3 ANOVA on aggressiveness: 


SOCIABILITY 
LOW MEDIUM HIGH 
THERAPY 1 
THERAPY 2 


Results from the resulting run on SPSS for Windows 12.0 are presented in Table 
7.6. They show, as expected, that there is a significant sociability by therapy inter- 
action (F = 19.735). The nature of this interaction can be gauged by examining the 
means for the SOCIAL*THERAPY table. These show that for low sociability 
therapy group | is more aggressive, whereas for high sociability therapy group 2 is 
more aggressive. The results from the Johnson-Neyman analysis for this data, pre- 
sented in the first edition of this text (p. 179), show that more precisely there is no 
significant difference in aggressiveness for sociability scores between 6.04 and 
7.06. 


7.12 USE OF SEVERAL COVARIATES 


What is the rationale for using several covariates? First, the use of several 
covariates will result in greater error reduction than can be obtained with just one 
covariate. The error reduction will be substantially greater if there are low 
intercorrelations among the covariates. In this case each of the covariates will be 
removing a somewhat different part of the error variance from the dependent vari- 
able. Also, with several covariates we can make a better adjustment for initial dif- 
ferences among groups. 

Recall that with one covariate simple linear regression was involved. With sev- 
eral covariates (predicators), multiple regression is needed. In multiple regression 
the linear combination of the predictors that is maximally correlated with the de- 
pendent variable is found. The multiple correlation (R) is a maximized Pearson 
correlation between the observed scores on y and their predicted scores, R = ryy. Al- 
though R is more complex it is a correlation and hence R? can be interpreted as 
“proportion of variance accounted for.” Also, we will have regression coefficients 
for each of the covariates (predictors). Below we present a table comparing the sin- 
gle and multiple covariate cases: 
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TABLE 7.7 
SPSS MANOVA Control Lines for ANCOVA on Sesame Street Data 
With Two Covariates 


TITLE ‘SESAME ST. DATA-2 COVARIATES’. 
DATA LIST FREE/ SITE PRENUMB PRERELAT POSTNUMB. 
BEGIN DATA. 


DATA LINES 


END DATA. 
MANOVA PRENUMB PRERELAT POSTNUMB BY SITE(1,3)/ 
ANALYSIS POSTNUMB WITH PRENUMB PRERELAT/ 

PRINT = PMEANS/ 

DESIGN/ 

ANALYSIS = POSTNUMB/ 

DESIGN = PRENUMB+PRERELAT, SITE, PRENUMB BY SITE+ 
PRERELAT BY SITE/ 

ANALYSIS = PRENUMB PRERELAT/. 


One Covariate Multiple Covariates 
Error primarily determined by simple determined by the multiple correlation 
Reduction correlation R? — within variance on y accounted for 
гуу? — within variance on у by the set of covariates 
accounted for by x 
Adjustment уг = yi—b(xi — X), уѓ = у; by бај = X1) - b2(Xoj -х) 
of Means b is assumed common slope — БиХ — Xk) 


where the b; are the regression coefficients, x; у is the mean for covariate 1 in group 
j, Х2. у is the mean for covariate 2 in group j, etc., and the x; are the grand means for 
the covariates. 


713 COMPUTER EXAMPLE WITH TWO COVARIATES 


To illustrate running an ANCOVA with more than one covariate, we reconsider the 
Sesame Street data set used in Section 7.9. Again we shall be interested in site dif- 
ferences on POSTNUMB, but now we use two covariates: PRENUMB and 
PRERELAT (pretest on knowledge of relational terms—amount, size, and posi- 
tion relationship—maximum score of 17). Before we give the control lines for run- 
ning the analysis, we need to discuss in more detail how to set up the lines for test- 
ing the homogeneity assumption. For one covariate this is equality of regression 
slopes. For two covariates it is parallelism of the regression planes, and for more 
than two covariates it involves equality of regression hyperplanes. 


TABLE 7.8 
Printout from SPSS MANOVA for Sesame Data with Two Covariates 


Placeholder for T 0708 from p. 334 of previous edition. 


© This test indicates significant SITE differences at .05 level. 
(2) These аге the regression coefficients 
G These are the adjusted means, which would be obtained as follows: 


X = > - bis — 3) - by Hy — 5) 
— 25.437 — .564(16.563 — 21.670 — .622(8.563) — 10.14) 
— 29.30 


@ This test indicates parallelism of the regression planes is tenable at the .05 level. 
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It is important to recall that a violation of the assumption means there is a 
covariate by treatment (group) interaction. If the assumption is tenable this means 
the interaction will not be significant. Therefore, what one does in SPSS 
MANOVA is to set up an effect involving the interaction (for one covariate), and 
then test whether this effect is significant. If the effect is significant, it means the 
assumption is not tenable. 

For more than one covariate, as in the present case, there is an interaction term 
for each covariate. The effects are lumped together and then we test whether the 
combined interactions are significant. Before we give a few examples, note that 
BY is the keyword used by SPSS to denote an interaction, and + is used to lump ef- 
fects together. 

We show the control lines for testing the homogeneity assumption for two 
covariates and for three covariates. Denote the dependent variable by y, the 
covariates by x; and x2 and the grouping variable by gp. The control lines are 


ANALYSIS = Y/ 
DESIGN = X1+X2,GP,X1 BY GP+X2 BY GP / 


Now, suppose there were three covariates. Then the control lines will be: 


ANALYSIS = Y / 
DESIGN = X1+X2+X3,GP,X1 BY GP+X2 BY GP+X3 ВУ GP / 


The control lines for running the ANCOVA on the Sesame Street data with the 
covariates of PRENUMB and PRERELAT are given in Table 7.7. In Table 7.8 we 
present selected output from the SPSS analysis of covariance. 


7.14 SUMMARY 


1. In analysis of covariance a linear relationship is assumed between the de- 
pendent variable and the covariate(s). 

2. ANCOVA is directly related to the two basic objectives in experimental de- 
sign of (a) eliminating systematic bias and (b) reduction of error variance. 

While ANCOVA does not eliminate bias, it can reduce bias. The use of several 
covariates with low intercorrelations will substantially reduce error variance. 

3. Limit the number of covariates (C) so that 


Gea) 24 
N 


0 


where J is the number of groups and N is total sample size. 
4. A numerical example is given to show the intimate relationship between 
ANCOVA and the results for ANOVA on the same data. 


TABLE 7.9 


Placeholder for T0709 from p. 337 of previous edition. 
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5. Measurement error оп the covariate causes loss of power in randomized de- 
signs, and can lead to seriously biased treatment effects in non-randomized de- 
signs. 

6. In examining printout from the statistical packages, first make two checks to 
determine whether covariance is appropriate: (1) check that there is a linear rela- 
tionship between the covariate and the dependent variable and (2) check that the 
regression slopes are equal. If either of these is not true, then covariance is not ap- 
propriate. In particular, if (2) is not true then the Johnson—Neyman technique 
should be considered. 

7. Several cautions are given concerning the use of analysis of covariance with 
intact groups. 

8. Three ways of analyzing a k group pretest-posttest design are: ANOVA on 
the difference scores, analysis of covariance, and a two way repeated measures 
ANOVA. Articles by Huck and McLean (1975) and by Jennings (1988) show that 
ANCOVA is generally the preferred method of analysis. 

9. Although the Johnson—Neyman technique is preferred when the slopes are 
not equal, it is still not available on SAS or SPSS. A violation of the equal slopes 
assumption means there is a group by covariate interaction effect. Because of this, 
we illustrated, in Section 7.12, use of a two way ANOVA to get at the nature of this 
interaction. 

10. We showed how ANCOVA can be done using multiple regression. By 
dummy coding group membership and appropriate multiplication we obtained 
both the test for homogeneity of regression slopes and the ANCOVA. 


EXERCISES 


1. A social psychological study by Novince (1977) examined the effect of be- 
havioral rehearsal, and behavioral rehearsal plus cognitive restructuring 
(combination treatment) on reducing anxiety and facilitating social skills 
for female college freshmen. The 33 subjects were randomly assigned (11 
each) to either BH, a control group (group 2), or BH + CR. The subjects 
were pretested and posttested on several variables. The scores for the 
avoidance variable are given as follows: 
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BEHAVIORAL REHEARSAL + 
BEHAVIORAL COGNITIVE 
REHEARSAL CONTROL RESTRUCTURING 
Avoid Preavoid Avoid Preavoid Avoid Preavoid 

91 70 107 115 121 96 
107 121 76 77 140 120 
121 89 116 111 148 130 
86 80 126 121 147 145 
137 123 104 105 139 122 
138 112 96 97 121 119 
133 126 127 132 141 104 
127 121 99 98 143 121 
114 80 94 85 120 80 
118 101 92 82 140 121 
114 112 128 112 95 92 


Table 7.9 shows selected printout from an ANCOVA on SPSS for Windows 
7.5 (top two thirds of printout). 

(a) Is ANCOVA appropriate for this data? Explain. 

(b) If ANCOVA is appropriate, then do we reject the null hypothesis of 
equal adjusted population means at the .05 level? 

(c) The bottom portion of the printout shows the results from an ANOVA 
on just avoidance. Note that the error term is 280.07. The error term for the 
ANCOVA is 111.36. How are the two error terms fundamentally related? 


2. (a) Run an ANOVA on the difference scores for the data in exercise 1. 
(b) Compare the error term for that analysis vs the error term for the 
ANCOVA on the same data. Relate these results to the discussion in Sec- 
tion 7.10. 


3. This question relates the use of a pretest as covariate to experimental de- 
sign considerations. Suppose in a counseling study eight subjects were ran- 
domly assigned to each of three groups. The subjects were pretested and 
posttested on client satisfaction, which served as the dependent variable. 
(a) What is the main reason for using the pretest here as a covariate? 

(b) In what other way might the covariate be useful? 
(c) What effect would the possibility of pretest sensitization have on your 
decision to use a pretest in this study? 


4. An analysis of variance is run on three intact groups and a significant dif- 
ference is found at the .05 level. The pattern of means is 
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GP 1 GP 2 GP 3 
COVARIATE 120 100 110 
DEP. VAR. 70 60 65 


A few days later the investigator, after talking to a colleague, runs an 
ANCOVA on this data and no longer finds significance at the .05 level. The 
correlation between the dependent variable and the covariate is .61 and the 
homogeneity of regression slopes assumption is found to be tenable. Ex- 
plain what has happened here, and relate this to the discussion in section 
T3: 


. A study by Huck and Bounds (1972) examined whether the grade assigned 


an essay test is influenced by handwriting neatness. They hypothesized 
that an interaction effect would occur, with graders who have neat hand- 
writing lowering the essay grade while graders with messy handwriting 
will not lower the grade. Students in an Educational Measurement class at 
the University of Tennessee served as subjects. Sixteen were classified as 
having neat handwriting and 18 were classified as messy handwriters. 
Each of these 34 subjects received two one page essays. A person with av- 
erage handwriting neatness copied the first (better) essay. The second es- 
say was copied by two people, one having neat handwriting and one having 
messy handwriting. Each subject was to grade each of the two essays ona 
scale from 0 to 20. Within the neat handwriters, half of them were ran- 
domly assigned to receive a neatly written essay to grade and the other half 
a messy essay. The same was done for the messy handwriters who were 
acting as graders. The grade assigned to essay 1 served as the covariate in 
this study. Means and adjusted means are given below for groups: 


Neat Essay Messy Essay 

Neat Essay 1 X = 14.75 Essay 1 x = 15.00 
Writer Essay 2 x = 13.00 Essay 2 x = 9.75 
sd = 2.51 sd = 3.62 
Adj. Mean = 13.35 Adj. Mean = 9.98 

Messy Essay 1 x = 16.33 Essay 1 x = 15.78 

Writer Essay 2 x =12.11 Еззау 2 x = 12.44 
sd = 3.14 sd = 2.07 

Adj. Mean = 11.70 Adj. Mean = 12.30 


The following is from their RESULTS section (Huck & Bounds, 1972): 


Prior to using analysis of covariance, the researchers tested the assumption of ho- 
mogeneous within-group regression coefficients, Since this preliminary test proved 
to be nonsignificant (F = 1.76, p > .10), it was appropriate to use the conventional 
covariance analysis. 
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Results of the 2 x 2 analysis of covariance revealed that neither main effect was 
significant. However, an interaction between the legibility of the essay and the 
handwriting neatness of the graders was found to be significant (F = 4.49, p < .05). 
To locate the precise nature of this interaction, tests of simple main effects (Kirk, 
1968, p. 481) were used to compare the two treatment conditions, first for graders 
with neat handwriting and then a second time for graders with messy handwriting. 
Results indicated that neat writers gave higher grades to the neat essay than to the 
messy essay (F = 6.13, p < .05), but that messy handwriters did not differentiate 
significantly between the two essays. (pp. 281-82) 


(a) From what is mentioned in the above RESULTS section, can we be 
confident that analysis of covariance is appropriate? Explain. 

(b) What is the main reason for using analysis of covariance in this study? 
(c) Should the investigators have been concerned about the homogeneity 
of cell population variances in this study? Why, or why not? 

(d) Estimate the effect size for the interaction effect (see Section 4.6). Is it 
large or fairly large? Relate this to the sample size in the study and the sig- 
nificance that was found for the interaction effect. 


. Determine whether ANCOVA is appropriate for the HEADACHE data, us- 
ing UNCOMF as the dependent variable and the PREUNCOME as the 
covariate. What checks did you make? 


. What is the main reason for using ANCOVA in a randomized study? 


. Cochran, in his 1957 review article on ANCOVA, made the statement that 
ANCOVA will not be useful when the correlation between the dependent 
variable and the covariate is less than .3 in absolute value. Why did he say 
this? 
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8.1 INTRODUCTION 


In the social sciences, nested data structures are very common. As Burstein noted, 
“Most of what goes on in education occurs within some group context" (1980). 
Nested data (which yields correlated observations) occurs whenever subjects are 
clustered together in groups as is frequently found in social science research. For 
example, students in the same school will be more alike than students from a dif- 
ferent school thereby implying some non-independence. Responses of patients to 
counseling for those patients clustered together in therapy groups will depend to 
some extent on the patient's group's dynamics resulting in a within-therapy group 
dependency (Kreft & deLeeuw, 1998). Yet one of the assumptions made in many of 
the statistical techniques (including regression, ANOVA, etc.) used in the social 
sciences is that the observations are independent. 
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Kenny and Judd noted that while non-independence is commonly treated as a 
nuisance, there are still “тапу occasions when nonindependence is the substantive 
problem that we are trying to understand in psychological research” (1986, p. 431). 
The authors refer to researchers interested in studying social interaction. Kenny 
and Judd note that social interaction by definition implies non-independence. If a 
researcher is interested in studying social interaction, or even a plethora of other 
social psychology constructs, the non-independence is not so much a statistical 
problem to be surmounted as a focus of interest. 

Additional examples of dependent data can be found for employees working to- 
gether in organizations, and even citizens within nations. These scenarios, as well 
as students nested within schools and patients within therapy groups, provide ex- 
amples of two-level designs. The first level comprises the units that are grouped to- 
gether at the second level. For instance, students (level one) would be considered 
as nested within schools (level two), and patients (level one) are nested within 
counseling groups (level two). 

Examples of this nestedness of clustering does not always involve only two lev- 
els. A commonly encountered three-level design found in educational research in- 
volves students (level one) nested within classrooms (level two), clustered within 
schools (level three). Individuals (level one) are “nested” within families (level 
two) that are clustered in neighborhoods (level three). Patients (level one) are fre- 
quently counseled in groups (level two) that are clustered within counseling cen- 
ters (level three). There is an endless list of such groupings. When data are clus- 
tered in these ways, use of multilevel modeling should be considered. 

In the late 1970s, estimation techniques and programs were developed to facili- 
tate use of multilevel modeling (Raudenbush & Bryk, 2002; Arnold, 1992). Before 
this time, researchers would tend to use single-level regression models to investi- 
gate relationships between relevant variables describing the different levels despite 
the violation of the assumption of independence. This would be problematic for a 
variety of reasons. 


8.2 PROBLEMS USING SINGLE-LEVEL ANALYSES 
OF MULTILEVEL DATA 


A researcher might be interested in the relationship between students’ test scores 
and characteristics of the schools that they attended. The dataset might consist of 
student and school descriptors from students’ who were randomly selected from a 
random selection of schools. When investigating the question of interest, a re- 
searcher choosing to ignore the inherent dependency in his or her data would have 
two analytical choices (other than the use of multilevel modeling). The researcher 
could aggregate the student data to the school level and use school data as the level 
of analysis. This would mean that the outcome in a single-level regression might 
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have been the school’s average student score, with predictors consisting of school 
descriptors and average school characteristics summarized across students within 
each school. One of the primary problems with such an analysis is that valuable in- 
formation is lost concerning variability of students’ scores within schools, statisti- 
cal power is decreased and the ecological validity of the inferences has been com- 
promised (Hox, 2002; Kreft & de Leeuw, 1998). 

Alternatively, the researcher could disaggregate the student- and school-level 
data. This modeling would have involved using students as the unit of analysis and 
ignoring the non-independence of students’ scores within each school. In the sin- 
gle-level regression that would be used with disaggregated data, the outcome 
would be the student’s test score with predictors including student and school char- 
acteristics. The problem in this analysis is that values for school descriptors would 
be the same across students within the same school. Using this disaggregated data, 
and thus ignoring the non-independence of the students’ scores within each school, 
artificially deflates the estimated variability of the school descriptor. This would 
then affect the validity of the statistical significance test of the relationship be- 
tween the student outcome and the school descriptor and inflate the associated 
Type I error rate. The stronger the relationship between students’ scores for stu- 
dents within each school, the worse the impact on the Type I error rate. 

There is a measure of the degree of dependence between individuals that is 
called the intra-class correlation (ICC). The more that characteristics of the con- 
text (say, school) in which an individual (student) finds himself have an effect on 
the outcome of interest, the stronger will be the ICC. In other words the more re- 
lated to the outcome are the experiences of individuals within each grouping, the 
stronger will be the ICC (Kreft & de Leeuw, 1998). For two-level datasets (in 
which individuals there is only one level of grouping), the ICC can be interpreted 
as the proportion of the total variance in the outcome that occurs between the 
groups (as opposed to within the groups). 

Snijders and Bosker (1999, p. 151) indicate that 


“In most social science research, the intraclass correlation ranges between 0 and .4, 
and often narrower bounds can be identified.” 


Even an ICC that is slightly larger than zero can һауе a dramatic effect on Type 
Terror rates as can be seen in the table taken from Scariano and Davenport (1987) 
on the following page. 

Note from the table that for an ICC of only .01, with 3 groups and 30 subjects 
per group, the actual alpha is inflated to .0985 for a one way ANOVA. For a 3 
group, п = 30 scenario in which ICC = .10, the actual alpha is .4917! 

Fortunately, researchers do not have to choose between the loss of information 
associated with aggregation of dependent data nor the inflated Type I error rates as- 
sociated with disaggregated data. Thus, instead of choosing a level at which to con- 


324 CHAPTER 8 


Actual Type | Error Rates for Correlated Observations іп а One Way 
ANOVA (Nominal с = .05) 


Intraclass Correlation (ICC) 


m n .00 .01 10 30 .50 
2 3 .0500 .0522 .0740 .1402 .2374 
10 .0500 .0606 .1654 .3729 .5344 
30 .0500 .0848 .3402 .5928 .7205 
100 .0500 .1658 5716 .7662, .8446 
3 3 .0500 .0529 .0837 .1866 .3430 
10 .0500 .0641 2227 .5379 .7397 
30 .0500 .0985 4917 .7999 9049 
100 0500 2236 7791 9333 9705 
5 3 0500 .0540 .0997 .2684 .5149 
10 .0500 .0692 .3151 7446 .9175 
30 .0500 .1192 .6908 .9506 .9888 
100 .0500 .3147 .9397 .9945 .9989 


m—number of groups 
n—number of observations per group 


duct analyses of clustered or hierarchical data, researchers can instead use the tech- 
nique called “multilevel modeling.” This chapter will provide an introduction to 
some of the simpler multilevel models. There are several excellent multilevel mod- 
eling texts available (Raudenbush & Bryk, 2002; Hox, 2002; Snijders & Bosker, 
1999; Kreft & de Leeuw, 1998) that will provide the interested reader additional 
details as well as discussion of more advanced topics in multilevel modeling. 

Several terms are used to describe essentially the same family of multilevel 
models including: multilevel modeling, hierarchical linear modeling, (co)variance 
component models, multilevel linear models, random-effects or mixed-effects 
models and random coefficient regression models, among others (Raudenbush & 
Bryk, 2002; Arnold, 1992). I will use “multilevel modeling" and “hierarchical lin- 
ear modeling" in this introduction as they seem to provide the most comprehensi- 
ble terms. 

In this chapter, formulation of the multilevel model will first be introduced. 
This will be followed with an example of a two-level model. This example, which 
involves students within classes, we will first consider what 1s called an uncondi- 
tional model (no predictors at either level). Then we consider adding predictors at 
level 1 and then a predictor at level 2. After this example we consider evaluating 
the efficacy of treatments on some dependent variable, and compare the HLM6 
analysis to an SPSS analysis of the same data. In conclusion, we offer some final 
comments on HLM. 
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8.3 FORMULATION OF THE MULTILEVEL MODEL 


There are two algebraic formulations possible for the hierarchical linear model 
(HLM). The set of equations for each level can be represented separately (while in- 
dexing the appropriate clusters), or alternatively, each level’s equations can be 
combined to provide a single equation. The multiple levels equations formulation 
(Raudenbush & Bryk, 1992; 2002) seems to be the easiest to comprehend for a 
neophyte HLM user in that it simplifies the variance component assignment and 
clearly distinguishes the levels. This formulation also is the one that is imple- 
mented in the multilevel software HLM (Raudenbush, Bryk, Cheong, & Congdon, 
2000). Because the HLM software will be used to demonstrate estimation of HLM 
parameters in this chapter, the multiple levels’ formulation will be used. 


8.4 TWO-LEVEL MODEL—GENERAL FORMULATION 


Before presenting the general formulation of the two-level model, some terminol- 
ogy will first be explained. Raudenbush and Bryk (2002) distinguish between un- 
conditional and conditional models. The unconditional model is one in which no 
predictors (at any of the levels) are included. A conditional model includes at least 
one predictor at any of the levels. 

Multilevel modeling permits the estimation of fixed and random effects 
whereas ordinary least-squares (OLS) regression includes only fixed effects. For 
this reason, it is important to distinguish between fixed and random effects. If a re- 
searcher is interested in comparing two methods of counseling, for example, then 
the researcher would not be interested in generalizing beyond those two methods. 
The inferences would be “fixed” or limited to the two methods under consider- 
ation. Thus counseling method would be treated as a fixed factor. Similarly, if three 
diets (Atkins, South Beach, and Weight Watchers, for instance) were to be com- 
pared, then the diets were not randomly chosen from some population of diets, thus 
once again diets would be a fixed factor. 

On the other hand, consider two situations in which a factor would be consid- 
ered random. A researcher might be interested in comparing three specific teach- 
ing methods (fixed factor) used across schools in nine different random schools in 
some metropolitan area. The researcher would wish to generalize inferences about 
the teaching methods’ effects to the population of schools in this area. Thus, here, 
schools is a random factor and teaching method effects would be modeled as ran- 
domly varying across schools. As a second example, consider the design in which 
patients are clustered together in therapy groups. Although a researcher would be 
interested in limiting her inferences to the specific counseling methods involved 
(fixed effect), she might want to generalize the inferences beyond the particular 
therapy groups involved. Thus groups would be considered a random factor and 
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counseling method effects modeled as randomly varying across groups. For fur- 
ther discussion of fixed and random effects, the interested readers should look at 
Kreft and de Leeuw’s discussion (1998). 

This two-level example will involve investigating the relationship between stu- 
dents’ scores on a Mathematics achievement test in the 12th grade (Math_12) anda 
measure of the student’s interest in mathematics (ШМ). For students in a certain 
classroom, a simple one-level regression model could be tested: 


Y; = Bo + 1Х; +7 (1) 


where Y; is student i’s grade 12 Math score, X; is student i’s ПМ score, B is the 
slope coefficient representing the relationship between Math, 12 and ПМ, and Bo is 
the intercept representing the average Math 12 score for students in the class's 
sample given a score of zero on X;. The value of B; indicates the expected change in 
Math, 12 given а one unit increase in ИМ score. Тһе ғ; represents the “residual” or 
deviation of student 75 Math, 12 score from that predicted given the values of Bo, 
the student's Ху, and Bı. It is assumed that r; is normally distributed with a mean of 
zero and a variance of 62, or r; ~ N(0,02). 

A brief note should be made about centering the values of a predictor. As men- 
tioned above, the intercept, Bo, represents the value predicted for the outcome, Y;, 
given that X; is zero. It is important to ensure that a value of zero for X; is meaning- 
ful. Interval-scaled variables are frequently scaled so that they are "centered" 
around their mean. To center the ИМ scores, they would need to be transformed so 
that student 7’s value on X; was the deviation of student 175 ИМ score from the sam- 
ple mean of the ИМ scores. If this centered predictor were used instead of the origi- 
nal raw ИМ score predictor, then the intercept Bo would be interpreted as the рге- 
dicted Math, 12 score for a student with an average ИМ score. 

A regression equation just like Equation 1 might be constructed for students in a 
second classroom. The relationship between Math, 12 and ИМ, however, might 
differ slightly for the second classroom. Similarly, the coefficients in Equation 1 
might be slightly different for other classrooms also. The researcher might be in- 
terested in understanding the source of the differences in the classrooms' intercepts 
and slopes. For example, the researcher might want to investigate whether there 
might be some classroom characteristic that lessens or overcomes the relationship 
between a student's interest in mathematics (ИМ) and their performance on the 
math test (Math 12). To investigate this question, the researcher might obtain a 
random sample of several classrooms to gather students’ Math, 12 and ИМ scores 
as well as measures of classroom descriptors. Now regression equation 1 could be 
calculated for each classroom j such that: 


Yj = Во; + BuX; nij Q) 
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where the estimates for classroom j of the intercept, Во), and slope, B1; might differ 
for each classroom. For each classroom’s set of residuals, ғу, it is assumed that 
their variances are homogeneous across classrooms, where rj ~ N(0,02). 

The researcher would (hopefully) realize that given a large enough sample of 
classrooms' data, multilevel modeling could be used for this analysis. Math scores 
of students within the same classroom are likely more similar to each other than to 
scores of students in other classrooms. This dependency needs to be modeled ap- 
propriately. This brings us to the multiple sets of equations formulation of the 
HLM. 

If multilevel modeling were to be used in the current example, then students are 
nested within classrooms. The higher level of grouping or clustering is associated 
with a higher value for the assigned HLM level. Thus, students will be modeled at 
level one and classrooms (within which students are *nested")" at level two. The 
level one (student level) equation has already been presented (in Equation 2). The 
classroom level (level two) equations are used to represent how the lower level's 
regression coefficients might vary across classrooms. The regression coefficients, 
Boj апа Bi; become response variables modeled as outcomes at the classroom level 
(Raudenbush, 1984). Variation in classrooms' regression equations implies that 
the coefficients in these equations each might vary across classrooms. Variability 
іп the intercept, Boj, across classrooms would be represented as one of the level two 
equations by: 


Boj = Yoo + uoj (3) 


where f is the intercept for classroom j, Yoo is the average intercept across class- 
rooms (or, in other words, the average Math, 12 score across classrooms, control- 
ling for ИМ score) and uo; is classroom j’s deviation from Yoo, where uo; ~ N(0,t00). 

Variability in the relationship between ИМ and Math. 12 (the slope coefficient) 
across classrooms is represented as a level two equation: 


Bij = Yio +ш; (4) 


where D; is the slope for classroom j, Yio is the average slope across classrooms 
(or, in other words, the average measure of the relationship between Math 12 and 
ИМ scores across classrooms) and uj; is classroom j's deviation from y1o, where uj 
— N(0,111). It is commonly assumed that the intercept and slope (Во; and В; /) are 
bivariately normally distributed with covariance то (Raudenbush & Bryk, 2002). 

The two level two equations (Equations 3 and 4) are usually more succinctly 
presented as: 


n = Yoo + Uoj (5) 


Bij -Үю--ш/ 
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In this two-level unconditional model (see Equations 2 and 5) there are three 
sources of random variability: the level one variability, ғу, the level two (across 
classrooms) variability in the intercept, uoj, and in the slope, иу. An estimate of the 
level one variability, 62, is provided. Estimates of the level-two variance compo- 
nents, too and T11, (describing the variability of uo; and иј, respectively) can each 
be tested for statistical significance. 

Testing the variability of the intercept across classrooms assesses whether the 
variability of classrooms' intercepts (as measured using the associated variance 
component, too) differs from zero. If it is inferred that there is not a significant 
amount of variability in the intercept (or if it is hypothesized based on theory that 
the intercept should not vary across classrooms) then the random effects variability 
term, uoj, can be taken out of Equation 3 (or Equation 5) and the intercept is then 
modeled as fixed. 

If, on the other hand, it is inferred that there is a significant amount of variability 
in the intercept across classrooms, then variables describing classroom (level two) 
characteristics can be added to the model in Equation 3 (or equation for Bo; in 
Equation 5) to help explain that variability. (This will be demonstrated later in the 
chapter). If the classroom characteristics are found to sufficiently explain the re- 
maining variability in the intercept, then they can remain in the modified level two 
equation for the intercept and the random effect term can be taken out. With only 
level two predictors in Equation 3, the intercept is considered to be modeled as 
"non-randomly varying" (Raudenbush & Bryk, 2002). 

The variability in the slope coefficients can also be tested by inspecting the sta- 
tistical significance of the slope’s variance component, T11. If it is inferred that 
there is a significant amount of variability in the slopes, (implying that the relation- 
ship between Math. 12 and ПМ scores differs across classrooms), then a classroom 
predictor could be added to help explain the variability of B1; (in Equation 4 or 5). 
The addition of a level two predictor to the equation for the slope coefficient would 
be termed a “cross-level interaction" which is an interaction between variables de- 
scribing different clustering levels (Hox, 2002). The variance component remain- 
ing (conditional upon including the level two predictor) can then be tested again to 
see if it sufficiently explained the random variability in slopes. With the addition of 
a predictor that does influence the relationship between the level one variable 
(here, ИМ) and the outcome (Math. 12), the remaining variability will be lowered 
as will be the associated variance component, т. The values of the level two vari- 
ance components (for the intercept and slope coefficients) can be compared with 
their values in the unconditional (no predictors) model to assess the proportion of 
(classroom) level two variability explained by the predictors that were added to the 
model in Equation 5. This, as well as addition of level one and level two predictors 
to the model will be demonstrated further in the next section. 

Having discussed the formulation of the two-level HLM, use of the HLM soft- 
ware (version 6) will now be introduced and then demonstrated using a worked-ex- 
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ample. This example will be presented to demonstrate the process of HLM 
model-building involving addition of predictors to the two levels of equations, as 
well as interpretation of the parameter estimates presented in the HLM output. 


8.5 HLM6 SOFTWARE 


Raudenbush, Bryk, Cheong and Congdon's (2004) HLM software, version 6, for 
multilevel modeling provides a clear introduction for beginning multilevel model- 
ers. In addition, itis possible for students to obtain a free-ware copy of the program 
for simple multilevel analyses. This provides beginners with an easy way to evalu- 
ate for themselves whether they wish to purchase the entire program. (The website 
is WWW.SSICENTRAL.COM) 

The SS is an abbreviation for Scientific Software, which produces and distrib- 
utes the HLM software. When you get to this site, click on HLM. You will get a 
dropdown menu, at which point click on free downloads. 

The datasets being analyzed by HLM can be in any of the following formats: 
ASCII, SPSS, SAS portable, or SYSTAT. One of the complications of using HLM 
is that separate data files must be constructed for each level of clustering. For ex- 
ample, when investigating a two-level dataset, the user must construct a level one 
file as well as a level two file. These two files must be linked by a common id on 
both files. (This will be Теасћја in the example we are about to use). Data analysis 
via HLM involves four steps: 


1. Construction of the data files. 

2. Construction of the multivariate data matrix (MDM) file, using the 
data files. 

3. Execution of analyses based on the MDM file. 

4. Evaluation of the fitted model(s) based on a residual file. 


We will not deal with step 4 as this chapter is an introduction to HLM. 


8.6 TWO-LEVEL EXAMPLE—STUDENT 
AND CLASSROOM DATA 


The first step in using HLM to estimate a multilevel model is to construct the rele- 
vant datasets. As mentioned, for a two-level analysis, two data files are needed: 
one for each level. The level two ID variable (Teachld) in the current example must 
appear in both files. 

In this example, the researcher is interested in the relationship between scores 
on a 12th grade mathematics test (Math. 12) and student and classroom character- 
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istics. The researcher has information about students’ gender and their individual 
scores оп an interest in mathematics (ИМ) inventory and on the outcome of interest 
(Math_12). Thus, Math_12, ИМ, and Gender as well as the TeachlId identifying the 
teacher/classroom for each student must appear in the level one dataset. 

The researcher also has a measure of each classroom's “resources” (Resource) 
that assesses the supplies (relevant to mathematics instruction) accessible to a 
classroom of students. Thus the level two dataset will contain Resource and 


Teachld. We will use SPSS data files. 


Setting up the Datasets for HLM Analysis 


The level one dataset contains the level two id (TeachId) as well as the relevant stu- 
dent-level descriptors (ИМ and Gender) and outcome (Math, 12). Another minor 
complication encountered when using HLM is that the data should be sorted by 
level two id and within level two id, by student id. A snapshot of the Level one 
dataset appears in Figure 8.1. The raw data files are given at the end of the chapter. 


[z]2lvl. student. L1 - SPSS Data Editor 
Не Edt View Data Transform Analyze Graphs Utities Window Help 


1819) 3| ЕЛШЕЗІЛЕ Tle] BRE ЕСІС) 
| | 


teachid studid gender | math_12 iim 


1 5 0 94 | 35 
1 7| 1) 107 | 35 
т] 9 0| 97| 42| 
2| 14] 0| 92| 42| 
2| 16] 0| 94| 39| 
2| 19] 0| 92| 49| 
2| 20| gi 105| ат] 
3| 22| 0| 93! 40! 
3| 28| 1 101 | 50| 
3! 29! 1 113] 45| 
4| 32| gi 103! 42| 
4| 35! gi 101 | 43 
4| ЕЗІ 1 108| 47 
5| 47| 0| 98 | 37 
5| 49| 0| 95| 37 


FIGURE 8.1  Two-level model-student level SPSS dataset. 
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As can be seen in Figure 8.1, the dataset is set up to mimic the clustering inher- 
ent in the data. Students are “nested” within classrooms that are identified using 
the variable Теасћја. The first classroom (TeachId = 1) provides student-level in- 
formation on three students (students 5, 7 and 9). The second classroom provides 
data for four students (14, 16, 19 and 20), and so on. The level two dataset appears 
in Figure 8.2 below. 


E 2Ivl. class L2 - SPSS Data Editor 
File Edit View Data Transform Analyze 


{ 


S| S| 5] || | ъ|ъ 
: j 


teachid resource 


aa со Oy) Gy) NY On] | ~ 


FIGURE 8.2 Two-level model—classroom level spss dataset. 


In the level two dataset, the classroom information (here, the TeachId and the 
classroom’s score on the Resource measure) are listed. Note that the Teachld val- 
ues are ordered in both the level one and level two files as required by HLM soft- 
ware. 


Setting up the MDM File for HLM Analysis 


Before using HLM, the user needs to first construct what is called the “multivariate 
data matrix” or MDM file that sets up the datasets (regardless of their original for- 
mat) into a format that can be used more efficiently when running the HLM pro- 
gram. (Note that in prior versions of HLM, an SSM file was constructed instead of 
an MDM file). Once the datasets are set up in SPSS (or other relevant statistical 
software programs) the following steps are taken to set up the MDM file. 
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1. Once the HLM program is opened, click on FILE, scroll down to “MAKE 
NEW MDM FILE” and request STAT PACKAGE INPUT as shown in the follow- 
ing screen (Figure 8.3). 


ЕҢ WHLM: him2 MDM File: 21у1_епр.тат 
File t | : 


Create a new model using an existing MDM file 
Edit/Run old command(.hlm] mlm) File 


Save Model as .emf 


Make new MDM file > ASCII input 


Make new MDM from old МОМ template(.mdmt) file Stat package input 


Display MDM stats 


View Output 
Graph Equations > 
Graph Data > 
Preferences 


Exit 


FIGURE 8.3 First HLM window for building MDM file. 


2. You must then identify the kind of modeling to be used from the window dis- 
played below. Choose HLM2 for this two-level example and click on OK. 


Select MDM type 


Hierarchical Linear Models 
= HLM2 C HLM3 


Hierarchical Multivariate Linear Models 
^ HMLM C HMLM2 


Cross-classified Linear Models 
^ HCM2 


Cancel | 


FIGURE 8.4 Second HLM window for building MDM file. 
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3. After clicking on HLM2, the “Make MDM—HLM2” HLM window appear- 
ing below in Figure 8.5 will appear. Fill in a filename for the MDM file (under 
MDM File Name) being sure to include “МГМ” as the suffix. Given SPSS 
datasets are being analyzed, make sure to change INPUT FILE TYPE to 
SPSS/WINDOWS before attempting to find the relevant level one and two data 
files. Because the first multilevel example involves students nested within class- 
rooms, be sure to click on “persons within groups” instead of “measures within 
persons". Select the level one data file by clicking on BROWSE under LEVEL-1 
SPECIFICATION апа finding the relevant file (here, called 2lvl stu- 
dent. L1.SAV). Note that the level one and level two SPSS files that are going to be 
used in the analysis should not be open in SPSS when the user is constructing the 
MDM file. 


MDM template file MDM File Name (use mdm suffix) 


File Name | таћ1 2 тат 
Open топи йе | Savemdmtfile| Editmdmtfile Input File Type [ SPSS/Windows - 


Nesting of input data 
(* persons within groups] С measures within persons 


Level-1 Specification 


Browse Level-t File Name: FAHLM ChaptenHimdatal2IM student L1.sav Choose Variables 


Missing Data? Delete missing data when 


% No С Yes C making mdm С running analyses 


Level-2 Specification 


Browse | Level-2 FileName: FAHLM ChapteriHimdata\2i_class_L2. sav Choose Variables 
Make MDM Check Stats Done 


FIGURE 8.5 Third HLM window for building MDM file. 


4. Click on CHOOSE VARIABLES and select the level two id (TeachId) that 
links the level one and two files as well as the relevant level one variables (Gender, 
Math_12, and ИМ in the current example). Figure 8.6 displays this screen. In both 
Figures 8.6 and 8.7 it should read “in MDM” (since we using version 6 of HLM). 

5. Follow the same procedure to identify the relevant level two file for use in 
the MDM by clicking on BROWSE and finding the level two .SAV file (here, the 
2lvl class L2.SAV file). Again, click on CHOOSE VARIABLES and identify the 
level two id (Теасћја) and the level two variables of interest (just Resource in the 
current example). The level two CHOOSE VARIABLE screen appears in Figure 
8.7 on the following page. 


334 CHAPTER 8 


Choose variables - HLM2 


FIGURE 8.6 Setting up an MDM File—Choosing Variables at Level One 


Choose variables - HLM2 


FIGURE 8.7 Setting up an MDM file—choosing variables at level two. 


6. Next, you need to click on “Save mdmt file" (to save the MDM template file) 
and provide a name for the response (.MDMT) file. 
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7. Click on “Make MDM” to ensure that the data has been input correctly. A 
MS-DOS window will briefly appear (after clicking on MAKE MDM) ending ina 
count of the number of level two and level one units. If there seems to be a disparity 
between the group and within-group sample sizes, make certain that the original 
data files are sorted by the level two id. 

8. Before you can exit the MAKE MDM window, you must also click on 
CHECK STATS. Once this is done, you can click on DONE to be brought to the 
HLM window that allows you to build the model to be estimated. 


The Two-Level Unconditional Model 


The unconditional model (including no predictors) is the model typically esti- 
mated first when estimating multilevel models. Estimation of the unconditional 
model provides estimates of the partitioning of the variability at each level. In the 
current example, this means that the variability between students can be estimated 
and the variability can be estimated between classrooms. If there is not a substan- 
tial amount of variability between classrooms, then this additional level of cluster- 
ing might not be needed. 

At level one, in the unconditional model, the outcome (Мат_12) for student i in 
classroom j is modeled only as a function of classroom /78 intercept (i.e. average 
Мат_12 score) and the student's residual: 


Math _12;; = Bo; tj (6) 


At level two, classroom j’s intercept is modeled to be a function of the average 
intercept (Math. 12 score) across classrooms and a classroom residual: 


Boj = Yoo + uoj (7) 


HLM’s presentation of these equations is very similar to Equations 6 and 7 al- 
though it does not include the relevant i and j subscripts. 


Estimating Parameters of the Two-Level 
Unconditional Model 


Once the MDM file is built, the HLM window that you can use to build your model 
appears with the newly constructed MDM automatically loaded. 

After the MDM is loaded, a blank formula screen appears with the list of level 
one variables appearing on the left-hand side of the screen. The steps necessary to 
build the unconditional two-level model are as follows: 
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WHLM: him2 MDM File: math12.mdm 


ІМТЕСРТІ 
GENDER 
MATH_12 
ІМ 


FIGURE 8.8 Selecting the outcome variable in HLM. 


пікі 
File Basic Settings Other Settings Run Analysis Help 
Outcome 


LEVEL 1 MODEL (bold: group-mean centering; bold italic: grand-mean centering) 
>> Level-1 << 


MATH 12 = Bo +r 


ІМТЕСРТІ LEVEL 2 MODEL (bold italic: grand-mean centering) 
GENDER 


MATH_12 
ІМ 


FIGURE 8.9 Unconditional model іп HLM for two-level model. 


1. Once the relevant MDM is loaded, the first thing a user must do is choose the 
relevant outcome variable (here, Math_12). Thus, click on Мат_12 and then 
OUTCOME VARIABLE as is shown in Figure 8.8. 

For a two-level model, HLM automatically presents the two-level “uncondi- 
tional model” with no predictors at levels one nor two as is shown in Figure 8.9. 


HIERARCHICAL LINEAR MODELING 337 


Basic Model Specifications - HLM2 


Distribution of Outcome Variable 
* Normal (Continuous) 
Bernoulli (D or 1) 
Poisson (constant exposure) 
Binomial (number of trials) 


Poisson (variable exposure) — 


Multinomial 

E Number of categories 

Ordinal 

Level-1 Residual File Level-2 Residual File 
Title | Unconditional two-level model 


Output file name [FAHLM Ch apter\Himdata\two_lev_out 
Graph file name ҒАН(М Chapter\Himdata\grapheq. дед 


Cancel OK 


FIGURE 8.10 HLM basic model specification model. 


If you wish to run the model (without saving it) and examine the output, click on 
RUN ANALYSIS. When you click on RUN ANALYSIS, the program will respond 
that the model has not been saved; just click on RUN THE MODEL SHOWN (wait 
several seconds). Then click on FILE and scroll and click on VIEW OUTPUT. By 
doing this you can skip steps 2 through 5 below. 

2. Click on BASIC SETTINGS to change the output file name from the default 
HLM2.TXT to something meaningful (like TWO_LEV.OUT as demonstrated be- 
low). It also helps to change the Title of the model from “no title” to something like 
“Unconditional two-level model” as this will appear on every page of the output. 


, please wait 
tarting values computed. Iterations begun. 
hould you wish to terminate the iterations prior to convergence, enter cntl-c 
value of the likelihood function at iteration -4.725440Е +002 
value the likelihood function at iteration -4.725364E*002 
value the likelihood function at iteration -4.725340E +002 


value the likelihood function at iteration -4.725332E+002 
value the likelihood function at iteration -4.725328E*882 
value the likelihood function at iteration -4.725328E*082 


FIGURE 8.11 HLM DOS window presenting iterations while HLM is running. 


338 СНАРТЕК 8 


For details about the remaining options, the reader can refer to the HLM manual 
(Raudenbush, Bryk, Cheong, & Congdon, 2004). Click OK. 

3. Save the model by clicking on FILE, then SAVE AS and typing in the 
model’s filename. 

4. Click on RUN ANALYSIS. Once the solution has converged, the MS-DOS 
window displaying the iterations (see below) will close and bring you back to the 
HLM model screen. (Based оп HLM's defaults, if more than 100 iterations are 
needed, the user will be prompted whether the program should be allowed to iter- 
ate until convergence. For the current dataset, only six iterations were needed until 
the convergence criteria were met. 

5. You can view the HLM output by clicking on FILE and then VIEW 
OUTPUT. 


8.7 HLM SOFTWARE OUTPUT 


The output containing the model’s parameter estimates can be viewed if the user 
clicks on File => View Output. The equations match the format of those presented 
in the original HLM window when the model was being built. This part of the out- 
put appears as follows: 


Summary of the model specified (in equation format) 


Level-1 Model 

Y = dE + m 
Level-2 Model 

BO = G00 + UO 


The listing of the equations’ coefficients is useful when the user needs to inter- 
pret the later output. Following the listing of the equations, the iterations and start- 
ing estimates for the various parameters are listed. After the information about the 
last iteration needed for the model’s estimation, the message “Iterations stopped 
due to small change in likelihood function” appears and the results that follow in- 
clude final parameter estimates. 

The first parameter estimate that appears is the variance, 62, of students’ 
Math_12 scores within classrooms (assumed homogeneous across classrooms). 
The value for the current data set is 50.47. The only other level two variance com- 
ponent that is estimated (in this unconditional model) represents the variability of 
classrooms’ intercepts, Too. The value of the Too estimate is 26.42 for the current ex- 
ample. Next, the reliability estimate of Bo; as an estimate of Yoo is provided and is 
.688 for the current data set. This indicates that the classrooms’ intercept estimates 
tend to provide moderately reliable estimates of the overall intercept (see the HLM 
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manual (Raudenbush, Bryk, Cheong, & Congdon, 2000) and Raudenbush & 
Bryk’s (2002) HLM text for more information about this form of reliability esti- 
mate). 

In the output, there are two tables containing estimates of the relevant fixed ef- 
fect(s). The second table lists the fixed effects estimates along with “robust stan- 
dard errors.” These should be used when summarizing fixed effects, however, if the 
standard errors in the two fixed effects’ tables differ substantially, then the user 
might wish to re-consider the fit of some of the assumptions underlying the model 
being estimated. The table containing the fixed effect estimate with robust stan- 
dard appears below: 


Final estimation of fixed effects 
(with robust standard errors) 


Standard Approx. 
Fixed Effect Coefficient Error T-ratio ағ P-value 
For INTRCPT1, BO 
INTRCPT2, 600 98.043234 ый 5 188-169 29 0.000 


The only fixed effect estimated in the two-level unconditional model is the in- 
tercept, Yoo (see Equation 7). The estimate of the average Math_12 value across 
schools is 98.04 with a standard error of 1.11. This coefficient differs significantly 
from zero (t(29) = 88.163, p < .0001). 

The next part of the output presents the estimates of the variance components. 
We have two variance components that are estimated, the variability within class- 
rooms, 62, and the variability between classrooms, Too. Values of these two compo- 
nents’ estimates were presented earlier in the output (as mentioned above) but also 
appear in table summary appearing as follows in the HLM output: 


Final estimation of variance components: 


Standard Variance 


Random Effect Deviation Component df Chi-square P-value 
INTRCPT1, (0 5.14032 209. 122 9) 96.72024 0.000 
level-1, R 7.10441 50.47257 


The variance component estimates match those mentioned earlier. The value of 
the Tooestimate can be tested against a value of zero using a test statistic that is as- 
sumed to follow a X? distribution (Raudenbush & Bryk, 2002). The results indicate 
that we can infer that there is a statistically significant amount of variability in 
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Мат_12 scores between classrooms (62 (29) = 96.72, р < .0001). This supports 
the two-level modeling of the clustering of students’ Math 12 scores within class- 
rooms. 

The estimates of the variance components can be combined to provide an addi- 
tional descriptor of the possible nestedness of the data. The intraclass correlation 
provides a measure of the proportion of the variability in the outcomes that exists 
between units of one of the multilevel model's levels. Specifically, for the 
two-level model estimated here, the intraclass correlation provides a measure of 
the proportion of variability in Math. 12 between classrooms. The formula for the 
intraclass correlation for a two-level model is: 


Too 
= — 8 
Рісс "NE (8) 


For the current data set, the intraclass correlation estimate 15 


^ Too 26.42 
Picc = = = = =; 
To +02 26.42 + 50.47 


which means that 34% of the variability in Math_12 scores is estimated to lie be- 
tween classrooms (and thus it can be inferred that about 66% lies within class- 
rooms). 

The last information appearing in the HLM output consists of the deviance sta- 
tistic that can be used to compare the fit of a model to the data when comparing two 
models. (It should be noted that to use the Deviance statistic to compare models 
one model must be a simplified version of the other in that some of the parameters 
estimated in the more parameterized model are not estimated but are instead con- 
strained to a certain value in the simplified model). For the current unconditional 
model estimated the deviance statistic’s value is 945.07 with two covariance pa- 
rameters estimated (с2апа тоо). 

Since a substantial amount of variability was found both within and among 
classrooms, student and classroom descriptors could be added to the model to ex- 
plain some of this variability. We will start by adding two student predictors to the 
level one equation. 


8.8 ADDING LEVEL ONE PREDICTORS TO THE HLM 


The dataset contains two student descriptors including Gender and interest in 
mathematics (ИМ) scores. The researcher was interested in first including ПМ 
scores as a level one predictor of Math, 12 scores. To add a level one variable to a 
model using HLM software, the user must click on the relevant variable. When a 
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variable is clicked on, HLM prompts for the kind of centering that is requested for 
the variable. The choices include: add variable uncentered, add variable group cen- 
tered, and add variable grand centered. 


Centering 


Before continuing with the description of the formulation of the model using HLM 
software, brief mention should be made of centering. It should be remembered that 
even in a simple, single-level regression model (Y; = Bo + В.Х; + ег) including a pre- 
dictor, X;, the intercept represents the average value of the outcome, Ү;, for person i 
with a zero on Х;. Users of single-level regression can “center” their predictors to 
ensure that the intercept is meaningful. This centering can be done by transforming 
subjects’ scores on Х; so that X; represents а person’s deviation from the sample’s 
mean on X;. This would transform interpretation of the single-level regression 
equation's intercept to be the average value of Y; for someone at the (sample) mean 
on X;. 

Alternatively, the simple regression might model the relationship between a di- 
chotomous predictor variable [representing whether a subject was in the placebo 
(zero dosage) group or a treatment (10mg dosage) group] and some measure of, 
say, anxiety. The predictor could be dummy-coded such that a value of zero was as- 
signed for those in the placebo group with a value of one for those in the treatment 
group. This would mean that the intercept would represent the predicted anxiety 
level for a person who was in the placebo condition. 

The importance of assigning a meaningful reference point for a value of zero for 
the predictors in single-level regression extends to the inclusion of interactions be- 
tween predictors in the single-level model. The reason for this is that the interpreta- 
tion of a main effect can be affected by the inclusion of an interaction between pre- 
dictors (resulting in the model: Y; = Bo + В.Х; + 822; + B3X; * Zi + ер. Specifically, if 
ап interaction is modeled between, say, predictor variables X and Z, then the coeffi- 
cient for the main effect of X represents the effect of X given Z is zero. Thus you 
want to ensure that a value of zero on Zis meaningful. Similarly, the main effect of 
Z would be interpreted (with the interaction of X and Z included in the model) as 
the effect of Z given X is zero. 

The need for centering predictor variables extends beyond single-level regres- 
sion equations to include multilevel modeling. In a two-level multilevel model, a 
choice of centering is available for any level-one predictor variables included in 
the level one equation. The level one equation depicted in Equation 2 (Yj = Boj + 
Вужу + ғу) represents a single level-one predictor, Ху, added to the model to help 
explain variability in the outcome, Yj. As in a single-level regression equation, the 
intercept, Boj, represents the predicted value of У; for someone with Хуу = 0. 

As in single-level regression, a score of zero on Xj; might be meaningful (as in 
the example in which membership in a placebo condition might be assigned a zero 
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on Xj; as compared with a value of one assigned to those іп a treatment condition). 
However, sometimes, a value of zero on the untransformed scale of X;; might be 
unrealistic. Raudenbush and Bryk (2002) use an example in which Xj is a subject’s 
SAT score for which feasible values only range from 200 to 800. In scenarios in 
which the value of zero on untransformed Xj; is not meaningful, a researcher 
should center his/her predictor variable. 

Given a two-level model, there are two primary options (beyond not centering 
at all) for centering the level one predictor variable. One option involves centering 
the variable around the grand mean of the sample (as was described as an alterna- 
tive for single-level regression), appropriately termed “grand-mean centering”. 
This is accomplished by transforming the score on Xj of subject i from group j 
(where, in the current example being demonstrated using HLM software, the 
grouping variable was “school”) into the deviation of that score, Ху, from the over- 
all sample’s mean score on Xj (represented as Х.). These transformed scores ( 
Xij — X.) are then used as the predictor of the outcome Y;; in Equation 2. This 
means that the intercept term in Equation 2 represents the predicted value on У; for 
someone with a value of zero on the predictor: (X;; — X. ). A subject with a value of 
zero on the predictor has ап Ху value equal to the grand mean: X... Thus the inter- 
cept is the predicted value on У; for someone at the grand mean on Xj. This 
grand-mean centering results in the intercept being interpretable as the mean on Yj 
for group j adjusted by a function of the deviation of the group's mean from the 
grand mean (Raudenbush & Bryk, 2002). 

In multilevel modeling, another alternative is available for centering a level one 
predictor variable. This alternative is termed “group-mean centering” and involves 
transforming the score, Ху, of person i in group j into the deviation of that person's 
score from that (that person's) group j’s mean on Xj: (Ху — X ;). This modifies in- 
terpretation of the intercept, Boj. so that it becomes the predicted value on Yj for 
someone with zero for (Ху — X ;), or someone with a score that is the equivalent of 
group j’s mean on Xj. 

Several authors (including Kreft & de Leeuw, 1998; Raudenbush & Bryk, 
2002) provide a detailed explanation for the correspondence between a model in 
which grand-mean centering is used and one in which variables are not centered. 
Essentially, when grand mean centering is used, a constant (the sample’s mean on 
the relevant predictor) is subtracted from each case’s value on the predictor. This 
means that the parameter estimates resulting from grand-mean centering can be 
linearly transformed to obtain the relevant uncentered variables’ model’s coeffi- 
cients. This is not always the case when a variable has been group-mean centered. 
In group-mean centering, the mean of the case’s group on the group-mean centered 
predictor is subtracted from the case’s value on a predictor. Clearly, each group’s 
mean will not be the same on the predictor and thus the same constant is not sub- 
tracted from each case’s predictor value. The correspondence between a model 
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ШЫҒЫ uy; is classroom j’s deviation from 70; 
Жо is the average intercept across classrooms; 
uy is classroom j’s deviation from Мо; 
Уо is the average slope across classrooms 


FIGURE 8.12 Adding a level one predictor to a two-level model in HLM. 


with group-mean centered variables and models without centering or with 
grand-mean centering is not generally direct. 

The reader should also be cautioned that, as in single-level modeling, choice of 
centering for predictors also impacts interpretation of main effects for variables 
when interactions that include that variable are modeled. This applies in multilevel 
modeling to same-level interactions between predictors as well as cross-level in- 
teractions in which, say, a level two predictor might be used to explain the relation- 
ship between a level one predictor and the outcome of interest. 

Choice of grand-mean versus group-mean centering clearly impacts the inter- 
pretation of the intercept. However, as described in detail by Raudenbush and Bryk 
(2002), the choice of centering can also impact estimation of the level-two vari- 
ances of the intercept and of the slope or coefficient of the predictor across groups 
(here, schools). This means that estimation of the variance in the uojs and the и1/8 
(see Equation 5) will also be impacted by whether group-mean centering or 
grand-mean (and/or no centering) is used. As summarized by Raudenbush and 
Bryk: “be conscious of the choice of location for each level-1 predictor because it 
has implications for interpretation of Boj, var(Boj) and by implication, all of the 
covariances involving [0у. In general, sensible choices of location depend on the 
purposes of the research. No single rule covers all cases. It is important, however, 
that the researcher carefully consider choices of location in light of those purposes; 
and it is vital to keep the location in mind while interpreting results" (Raudenbush 
& Bryk, 2002, p. 34). Several authors provide more detailed discussion of choice 
of centering than can be presented here (Snijders & Bosker, 1999; Kreft & de 
Leeuw, 1998; Raudenbush & Bryk, 2002). The reader is strongly encouraged to re- 
fer to these texts to help understand centering in more detail. 

In the example we are using to demonstrate use of HLM software, we will use 
grand mean centering for the ИМ variable. ИМ is added as a grand-mean centered 
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level one variable by clicking on the variable and requesting “ада variable grand 
centered.” In version 6 of HLM, the default is for a predictor’s effect to be modeled 
as fixed. (In version 5 of HLM, the default was for the effect to be random). To 
model this effect as random, click on the level two equation for the coefficient of 
ПМ and В, = Yio will become В, = Үю + ш (see Figure 8.12). Again, the HLM 
model does not present the relevant i and j subscripts (see Figure 8.12). 

Note that the regression coefficients in level I are response (dependent) vari- 
ables in level 2. In this regard, the following from Kreft and DeLeeuw (1998, p. 2) 
is very important, “It is essential to realize that multilevel models involve a statisti- 
cal integration of the different models specified at the levels of interest. The sim- 
plest integration takes place in the random coefficients model, where the first level 
regression coefficients are treated as random variables at the second level.” 

The output appears as before although with additional parameters estimated 
given this second model includes an additional predictor. The fixed effect esti- 
mates will be presented and discussed first and then the random effects estimates. 
The user should be reminded that the first results that appear in HLM output are 
initial estimates. The user needs to look at the end of the output file to find the final 
estimates]. 

Only two fixed effects were modeled: the intercept, Yoo, and the slope, yio: 


Final estimation of fixed effects 
(with robust standard errors) 


Standard Approx. 
Fixed Effect Coefficient Error T-ratio ағ P-value 
КОК МЕЕ (РИТЕ во 
INTRCPT2, G00 98.768313 0.886268 111.443 29 0.000 
For IIM slope, ВІ 
INTRCPT2, G10 OPS 53 O 270 5.224 29 0.000 


From the results (above), both parameter estimates differ significantly from 
zero. The intercept, Yoo, estimate is 98.77 (t(29) = 111.44, p < .0001) and the slope, 
Yio, estimate 15 .90 (t(29) = 5.22, p < .0001). This means that the average Math_12 
score, controlling for ИМ, is predicted to be 98.77. Here, due to the grand-mean 
centering of ИМ, the “controlling for ИМ" can be interpreted as: “for a student 
with the mean score on ИМ. The value of the slope coefficient estimate represents 
an estimate of the change in Math. 12 score predicted for a change of one іп ИМ 
score. Thus, these fixed effects coefficient estimates are interpreted very similarly 
to coefficients in OLS regression. Here, the higher a student’s ИМ score, the higher 
will be their predicted Мат_12 score. 
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The output describing the random effects estimates appear at the end of the out- 
put and are as follows: 


Final estimation of variance components: 


Standard Variance 


Random Effect Deviation Component df Chi-square P-value 
INTRCPT1, (0 3.90385 15.24001 29 (9552934 0.000 

IIM slope, U1 0.68000 0.46239 29 48.11048 0.014 
level-1, R 5.10085 26.01870 


The level one variance explained by the addition of ZIM to the model is seen in 
the reduction of the level one variance estimate, 62, from a value of 50.47 in the un- 
conditional model to a value of 26.02 in the current conditional model. In fact the 
proportion of the level one variance explained with the addition of ИМ to the 
model is: (50.47—26.02)/50.47 = .4844 or 48.44%. In terms of the variability in the 
outcome among classrooms, there is still a significant amount of variability re- 
maining in the intercept (“оо = 15.24, у2(29) = 65.22, р < .0001). It cannot be as- 
sumed that the average Math, 12 score controlling for ИМ can be assumed constant 
across classrooms. There is also a significant amount of variability in the ИМ slope 
coefficient across classrooms (111 = .46, X229) = 48.11, p < .05). Thus it cannot be 
assumed that the relationship between ИМ and Math, 12 can be assumed fixed 
across classrooms. 

Additional random effects information appears in the output right after the in- 
formation about the starting values and iterations required for convergence. 


Tau 
ТАРЫП” BO 15. 240017 0.43587 
ТІМ, BI 0.43587 0.46239 


Tau (as correlations) 
INTRCPT1, BO 1.000 0.164 
TIM B1 0.164 ЛІ; (шй) 


The first “Tau” (т) matrix provides the estimates of the elements of the 


covariance matrix of level two random effects: | юта | where То is the variance 
Е 001 Тп | 

of the intercept residuals uoj, 111 is the variance of the slope residuals, иј, and Тој is 

the covariance between the random effects, uo; and иу. The second Tau matrix is 

the correlation matrix corresponding to the first Tau matrix. It seems that there is 

not a strong correlation (r = .164) between the intercepts and the slopes. 
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The last lines іп the output indicate that the deviance of this second model is 
880.09 associated with four covariance parameters that are estimated (including 
62, too, 111, T01). The difference in the deviances between the unconditional model 
and the current conditional model is assumed to follow a (large-sample) x? distri- 
bution with degrees of freedom (DF) equal to the difference in the number of ran- 
dom effects parameters that are estimated in the two “nested” models. The differ- 
ence in the deviances: 945.07-880.10 = 64.97 can thus be tested against a X? 
statistic with 2 DF. The statistical significance of the deviance difference indicates 
that the fit of the simpler (unconditional) model is significantly worse and thus the 
simpler model should be rejected. 


Adding a Second Level One Predictor 
to the Level One Equation 


Because there still remains a substantial amount of variability in Math_12 within 
classrooms, and since the researcher might hypothesize that there are gender dif- 
ferences in Мат_12 scores (controlling for ИМ), a second level one predictor 
(Gender) will be added to the level one model. This is simply accomplished (in 
HLM software) by clicking on the relevant Gender variable. The variable Gender 
is coded with a zero for males and a one for females. The variable will be added as 
an uncentered predictor. Again, the default in HLM version 6 for adding a predic- 
tor is that it is to be modeled as a fixed effect. Click on the effect to change it so it is 
modeled as random and thus the level one equation to be estimated is: 


Math 12; = Bo; + у ИМ + B» ;Genden; + hij (9) 


ЕЙ WHLM: him2 MDM File: math12.mdm Command File: math12.hlm 
File Basic Settings Other Settings Run Analysis Help 
Outcome 


-| LEVEL 1 MODEL (bold: group-mean centering; bold italic: grand-mean centering) 


МАТН 12 = By + (GENDER) + p IM + / 
INTRCPT2 
RESOURCE 


LEVEL 2 MODEL (bold italic: grand-mean centering) 


Bo = Yoo * Up 
By 7054; 
Bo = Yao +Y 


FIGURE 8.13 Adding a second level one predictor to a two-level model in HLM. 
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and the level two equation is: 


Bo; = Yoo + uoj 
Bij = Yio t Wj (10) 
Boj = ү +u2; 


This will appear in the HLM command window without i and j subscripts as can 
be seen in Figure 8.13 below. 

The user can see that the centering used for Gender differs from that used for 
ИМ from the different font style used in the HLM window for those variables in the 
model appearing in Figure 8.13 above. For uncentered variables, the variable name 
is not highlighted, for a group-mean centered variable, the variable appears in bold 
font and for a grand-mean centered variable, the variable’s name is bolded and ital- 
icized. 

The fixed effects results for the model contained in Equations 9 and 10 are as 
follows: 


Final estimation of fixed effects 
(with robust standard errors) 


Standard Approx. P- 

Fixed Effect Coefficient Error T-ratio df Value 
For INTRCPT1, BO 

INTRCPT2, 600 92711182 о. 7567657 122,518 29 0.000 
For GENDER slope, B1 

INTRCPT2, G10 10. 750966 0.900732 21256 29 0.000 
For ТІМ slope, B2 

INTRCPT2, G20 0.552540 0.102726 5.379 29 0.000 


Now the intercept represents the average Math, 12 score for a boy with an ИМ 
score equal to that of the sample's mean //M score. The intercept is significantly 
greater than zero (Yoo = 92.72, (29) = 122.52, p « .0001). There is a significant 
Gender effect favoring girls over boys (То = 10.75, (29) = 11.94, p < .0001). The 
magnitude of this gender effect indicates that girls are predicted to have scores over 
10 points higher on the Math 12 than do boys with the same ИМ score. The coeffi- 
cient for //M is also significantly greater than zero (720 = 5.38, (29) = 5.38, p< 
.0001) indicating a strong positive relationship between students’ interest in math- 
ematics and their performance on the Math_12. 

The table of random effects’ estimates from the HLM output appears below: 
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Final estimation of variance components: 


Random Effect Standard Variance Chi- Р- 
Deviation Component ағ square value 
INTRCPT1, UO 22482 5.03699 18 12.50906 >.500 
GENDER slope, U1 1.40596 th yg) ДШ ЖПБ 0.240 
ІІМ slope, 02 0.29608 0.08766 18 29.18635 0.046 
level-1, R А 21052 17.72844 


The estimate of the remaining level one variability is now 17.73 indicating that 
the addition of Gender has explained an additional 16.43% of the variability within 
classrooms (originally 50.47 in the unconditional model, down to 26.02 in the con- 
ditional model with ПМ only as a predictor). Only 13.31% of the level one variabil- 
ity remains unexplained. The information contained in the table seems to indicate 
that there is not a significant amount of level two (among-classrooms) variability 
in the intercept or the Gender coefficient (p > .05). It should be emphasized that 
due to the small sample size within groups (i.e. the average number of children per 
classroom in this dataset is only 4.5) there is only low statistical power for estima- 
tion of the random effects (Hox, 2002). 

The deviance is 794.63 with seven parameters estimated (three variances of ran- 
dom effects: иу, u1j, u»j, three covariances between the three random effects and 
62). The difference in the deviances between this model and the one including only 
ПМ is 85.47 which is still statistically significant (%2(3) = 85.47, p < .0001) indi- 
cating that there would be a significant decrease in fit with Gender not included in 
the model. 

Despite the lack of significance in the level two variability and due to the likely 
lack of statistical power in the dataset for identifying remaining level two variabil- 
ity (as well as for pedagogical purposes), addition of a level two (classroom) pre- 
dictor to the model will now be demonstrated. 


8.9 ADDITION OF A LEVEL TWO PREDICTOR 
TO A TWO-LEVEL HLM 


In the classroom dataset, there was a measure of each of the classroom's mathe- 
matics pedagogy resources (Resource). It was hypothesized that there was a posi- 
tive relationship between the amount of such resources in a classroom and the 
class's performance on the Math, 12 controlling for gender differences and for stu- 
dents' interest in mathematics. This translates into a hypothesis that Resource 
would predict some of the variability in the intercept. The original level one equa- 
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tion (see Equation 9) will remain unchanged. However, the set of level two equa- 
tions (Equation 10) needs to be modified to include Resource as a predictor of Boj, 
such that: 


Bo; = Yoo + Yo. Resource ; + uoj 
+ = Үю и] (1) 
Boj = үзө +10; 


To accomplish this in HLM, the user must first click on the relevant level two 
equation (the first of the three listed in Equation 11). In the current example, the 
user is interested in adding the level two variable to the intercept equation (the one 
for Boj) so the user should make sure that that equation is highlighted. Then the user 
should click on the “Level 2” button in the upper left corner to call up the possible 
level two variables. Only one variable, Resource, can be added. (It should be noted 
here that the default in HLM is to include an intercept in the model. This default 
can be over-ridden by clicking on the relevant Intercept variable. See the HLM 
manual for further details). Once the user has clicked on Resource, the type of cen- 
tering for the variable must be selected (from uncentered or grand-mean centered). 
Grand-mean centering will be selected so that the coefficient, Yo1, can be inter- 
preted as describing a classroom with an average amount of resources. Once this is 
achieved, the HLM command screen appears as in Figure 8.14 below. 


ЕҢ WHLM: him2 MDM File: math12.mdm Command File: math12.hlm 
Other Settings 


Run Analysis Help 


File Basic Settings 


Outcome 


ІМТЕСРТ2 
RESOURCE 


LEVEL 1 MODEL (bold: group-mean centering; bold italic: grand-mean centering) 
MATH_12 = Bo + В, (GENDER) + B ШМ) +r 
LEVEL 2 MODEL (bold italic: grand-mean centering) 
Po = Yoo + Yo; (RESOURCE) + и, 
By = Yo t44 


tu 


Bo = то +42 


FIGURE 8.14 Adding a level two predictor to a two-level model in HLM. 


Once the output file has been specified in “Basic Specifications” and the com- 
mand file saved, the analysis can be run. More iterations are needed than are speci- 
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> value of likelihood function at iteration -3.940063Е +002 
е value likelihood function at iteration -3.940061Е»002 
> value the likelihood function at iteration -3.940059Е +002 
е value о the likelihood function at iteration 8 -3.948858E*882 
> value the likelihood function at iteration 82 = -3.948856E*882 
> value of the likelihood function at iteration 82 3.940054Е +002 
> value the likelihood function at iteration 3.940053Е +002 
^ value likelihood function at iteration 85 3.940051Е +682 
в value of e likelihood function at iteration = -3.940049Е +002 

value of likelihood function at iteration 3.946848 E +002 
> value of likelihood function at iteration 3.940046E*002 
> value of > likelihood function at iteration -3.940045Е +002 
^ value of в likelihood function at iteration -3.948043E*802 
> value of the likelihood function at iteration -3.948041E*802 
> value the likelihood function at iteration 92 -3.940048E +002 
> value - likelihood at iteration 9%: -3.940038Е +002 
> value ~ likelihood at it tion -3 .940037E +002 
> value the likelihood at iteration -3 .940035E +002 
> value she likelihood function at iteration -3.940034Е +0092 

value the likelihood function at iteration 97 3.940032Е +002 
> value the likelihood function at iteration 98 3.940031Е +002 

value - likelihood function at iteration 99 = -3.940029E*092 


^ maximum number of iterations has been reached, but е analysis has 
converged. Do you want to continue until convergenc 


FIGURE 8.15 MS-DOS window for a solution that was slow to converge. 


fied as the default (100) as evidenced in the MS-DOS window that resulted (and is 
presented in Figure 8.15). 

The user is prompted at the bottom of the screen that the program will continue 
its iterations towards estimation of a final solution if the user so desires. The user 
should enter “Y” if they are willing to wait through additional iterations. It should 
be noted that the solution can be considered more stable with fewer iterations. In 
addition, the estimation of multiple random effects with possibly insufficient sam- 
ple size can aggravate the location of a solution. Should the user be prompted to 
use additional iterations, the user might wish to continue with the solution but 
change the model to re-estimate it by modeling one or several of the parameters as 
fixed instead of random. 

When the model’s estimation did converge after 1497 iterations, the additional 
fixed effect estimate, o1, appears in the output on the following page: 
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Final estimation of fixed effects 
(with robust standard errors) 


Standard Approx. Р= 

Fixed Effect Coefficient Error T-ratio ағ Value 
For INTRCPT1, BO 

INTRCPT2, 600 Б), ИЛЕШЕ) 0532195 115-659 28 0.000 

RESOURCE, G01 1.416373 02605262 2.340 28 0.027 
For GENDER slope, В1 

INTRCPT2, G10 10.612002 0.852843 12.443 29 0.000 
For IIM slope, B2 

INTRCPT2, G20 0.598363 0,097142 6.160 29 0.000 


The classroom Resource measure was found to be significantly positively related 
to Math_12 controlling for gender and ИМ (You = 1.42, (28) = 2.34, p < .05). From 
the random effects’ estimates output: 


Final estimation of variance components: 


Standard Variance P- 
Random Effect Deviation Component df Chi-square value 
INTRCPT1, 00 1.54086 2.37425 dy 11.53940 >.500 
GENDER slope, 91 0.78565 0.61724 18 ОТТА! 0.259 
IIM slope, U2 0.24638 0.06070 18 20502 0.066 
level-1, R 40225962 17.83898 


The addition of Resource has reduced the level two variability in the intercept 
from 5.04 (in the model that included Gender and ITM) to 2.37. The deviance of the 
current model in which seven covariance parameters were estimated was 787.94. 


8.10 EVALUATING THE EFFICACY OF A TREATMENT 


HLM can be used to evaluate whether two or more counseling (or, say teaching) 
methods have a differential effect on some outcome. This example is designed to 
investigate the impact of two counseling methods and whether they have a differ- 
ential effect on empathy. It should be noted that in this example a smaller sample 
size is used than is typically recommended for HLM analyses. This is done to facil- 
itate its presentation. Five groups of patients are treated with each counseling 
method. Each group has four patients. While groups are nested within counseling 
method, because the research question is about a comparison of the two counseling 
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Level І Level 2 
Рама Gp Emp Content Gp Couns 

1 1 23 30 1 0 
2 1 22 33 2 0 
3 1 20 30 3 0 
4 1 19 28 4 0 
2 2 16 19 5 0 
6 2 17 2T 6 1 
7 2 18 28 7 1 
8 2 19 37 8 1 
9 3 25 35 9 1 

10 3 28 38 10 1 

11 3 29 38 

12 3 31 37 

13 4 27 44 

14 4 23 30 

15 4 22 31 

16 4 21 25 

17 5 32 37 

18 3 31 46 

19 5 28 42 

20 5 26 39 

21 6 13 24 

22 6 12 19 

23 6 14 31 

24 6 15 25 

25 7 16 27. 

26 7 17 34 

27 7 14 24 

28 7 12 22 

29 8 11 25 

30 8 10 17 

31 8 20 31 

32 8 15 30 

33 9 21 26 

34 9 18 28 

35 9 19 27 

36 9 23 33 

37 10 18 24 

38 10 17 33 

39 10 16 33 

40 10 23 29 


methods, they do not constitute a clustering level. Thus, we have a two-level nested 
design, with patients (level one) nested within groups (level two) and counseling 
method used as a fixed level two (group-level) variable. Counseling method will be 
used as a predictor to explain some of the variability between groups. We have two 
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separate data files with group ID (gp) in both files. The level one file contains 
group ID with data (scores on the empathy scale and the patient’s id number) for 
the four subjects in each of the ten groups. In addition, the level one file includes a 
measure of the patient’s contentment (content). The level two file has the group ID 
variable along with the counseling method (couns) employed in the relevant group 
coded either as 0 or 1. The data files are presented on p. 352. 

The MDM file is constructed and then the analysis conducted using HLM6. The 
model estimated includes counseling method as a fixed predictor. No level one pre- 
dictors are included in the model. The HLM results are presented below: 


Final estimation of fixed effects: 


Standard Approx. p= 
Fixed Effect Coefficient Error T-ratio ағ value 
For МЕР, во 
INTRCPT2, 600 23.850000 ӘЛБЕТТЕ бу) 8 0.000 
COUNS, 601 -7.650000 2.581182 =2. 964 8 0.019 
The outcome variable is EMP 
Final estimation of fixed effects 
(with robust standard errors) 
Standard Approx. P- 
Fixed Effect Coefficient Error T-ratio dE value 
For INTRCPT1, BO 
INTRCPT2, G00 23.850000 1.973069 12.088 8 0.000 
COUNS, 601 =7-650000 2-308679 3.314 8 0.012 


The robust standard errors are appropriate for datasets having a 
moderate to large number of level 2 units. These data do not meet 
this criterion. 


Final estimation of variance components: 


Standard Variance Chi-square Р- 
Random Effect Deviation Component ағ value 
ШЕЕ РД UO 3.86868 14.96667 8 78.86560 0.000 
level-1, R 2.59968 6. /5893 


Statistics for current covariance components model 
Deviance = 204 ATI 
Number of estimated parameters = 2 
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As noted in the output, there is an insufficient number of level two (group) units 
and thus the results with robust standard errors should not be used here. Note that 
the counseling method effect results [((8) 2—2.964, p = .019] indicate that the coun- 
seling method is statistically significant with the method coded using a zero having 
a stronger impact on empathy than the method coded with a one. Note also that the 
degrees of freedom is 8 which corresponds to the degrees of freedom between 
groups for a regular ANOVA. In their text, Maxwell and Delaney (p. 514, 2004) 
note that the proper error term for a nested design such as this is groups within 
methods. This is what would be used if SPSS had been used to analyze the data. 
Control lines and selected output from an SPSS analysis is given below: 


SPSS Control Lines for Univariate Nested Design 


DATA LIST FR 


т 


E/COUNS GP SUB EMP. 


BEGIN DATA. 

0 1 1 23 0 12 22 0 13 20 0 14 19 
0 21 16 0 22 17 0 2 3 18 0 24 19 
0. 3 125 0 32 28 0 3 3 29 0 34231 
0 41 27 0 4 2 23 0 43 22 0 44 21 
0 51 32 0 52 31 0 5 3 28 0 5 4 26 
1. 6 1 13 1 6 2 12 1 63 14 1 64 15 
l 7 1.16 d 27222-27 1207 214 1 7412 
1 8111 1 82 10 1 8 3 20 1 8 4 15 
1 91221 1 92 18 1. 9 3 19 1 94 23 
110118 1102 17 1103 16 1 10 4 23 
END DATA. 

UNIANOVA EMP BY COUNS GP SUB/ 

RANDOM GP SUB/ 

DESIGN SUB(GP(COUNS GP(COUNS) COUNS / 


)) 
PRINT = DESCRIPTIVES/. 


Note that in the SPSS syntax, the first number indicates the counseling method 
(0 or 1), the second number the group the patient is in (1 through 10), and the third 
number indicates the subject number. Thus the first set of four numbers represents 
that the subject is in the counseling method coded with a 0, is the first person in 
group | and has an empathy score of 23. The “RANDOM GP SUB/” line indicates 
that group and subject are being modeled as random factors. Lastly, the DESIGN 
command line indicates a nested design, with patients nested within groups which 
in turn are nested within counseling methods. 
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SPSS Printout forThree-Level Nested Design 


Tests of Between-Subjects Effects 


Intercept Hypothesis 16040,025 | 16040.025 240.751 
Error 533,000 | 66.625(а) 
SUB(GP(COUN Hypothesis 202.750 | 6.758 


S) Error .000 (б) 
GP(COUNS) Hypothesis 533.000 | 66.625 
Error 202.750 | 6.758(с) 
COUNS Hypothesis 585.225 | 585.225 
Error 533.000 66.625(a) 


a MS(GP(COUNS)) 
b MS(Error) 
с MS(SUB(GP(COUNS))) 


Note that the error term for the counseling method effect is groups within meth- 
ods. Remember that there are 5 groups within each of the two counseling methods 
so that the degrees of freedom is 8. This can be seen on the SPSS output above 
where F = 8.784, p = .018 for the counseling effect (couns). This corresponds (with 
rounding error) to the square of the effect found with HLM for the couns variable: 
(-2.964)? = 8.785 indicating the correspondence between the SPSS and HLM 
analysis for this fixed effect. However, the error term for groups within counseling 
in SPSS is NOT correct because it is based on 30 degrees of freedom (for the error 
term). The degrees of freedom for error SHOULD be less than 30 because the ob- 
servations within the groups are dependent. Here, one would prefer the results 
from the HLM6 analysis, which indicates significant group variability (y? = 
78.866, p « .05). Note lastly that for an analysis of three counseling methods, two 
dummy variables would be needed to identify group membership and both of these 
variables could be used as predictors at level 2. 


Adding a Level One Predictor to the Empathy Model Data 
In the model estimated for the Empathy data above where level one is formulated: 
Empi = Boj + rij 
and level two: 
Boj = Yoo + уо Тху + шу, 


the variability in the intercept across treatment groups (тоо) even after controlling 
for treatment effects is seen to be significantly greater than zero [72 (8) = 78.86560, 
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р < .05]. A researcher might be interested in adding a level one predictor to help ex- 
plain some of this remaining variability in Empathy using the patient’s level of 
Contentment with the level one formulation becoming: 


Empi = Boj + BijContenti; + rij 
and at level two: 


Boj = Yoo “-Үо11х; + uoj 
Bij = Yio T Uj 


Addition of the level one predictor (Content) modifies interpretation of the in- 
tercept, goo, from the “predicted empathy score for patients in the group for which 
Tx = 0” to the “predicted empathy score for patients controlling for level of con- 
tentment (i.e. for whom Content = 0) in a treatment group for which Tx = 0. Note 
that we will grand-mean center Content so that a patient with Content = 0 is one at 
the mean on the contentment scale. 

Estimating this model with HLM, we find the following fixed effect estimates: 


Final estimation of fixed effects: 


Standard Approx. P- 
Fixed Effect Coefficient Error T-ratio ағ value 
For INTRCPT1, BO 
INTRCPT2, 600 22.584852 1.228768 18.380 8 0.000 
TX, 601 -5.315897 1.719344 -2.094 8 0.016 
For CON slope, B1 
INTRCPT2, G10 0,355810 0.073244 4.858 9 0.001 


We see that the coefficient for Content is statistically significant (Үю = 0.356, 
1(9) = 4.86, p < .05). We can also see that a treatment effect is still found to favor the 
groups for whom Tx = 0 (o1 ==5.319, (8)--3.094,р < .05). 


The random effects estimates were: 


Final estimation of variance components: 


Random Effect Standard Variance P= 
Deviation Component df Chi-square value 
INTRCPT1, UO 2.2127) 5.96028 8 29, 46502 0.000 
CON slope, ІЛ 0.04038 0.00163 9 9.39983 0.401 


level-1, R Bc BAS US) 4.94526 
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For this model in which Content is modeled to vary randomly across therapy 
groups we can thus see that a significant amount of variability remains in the inter- 
cept (even with Content added to the model). However, there is not a significant 
amount of variability between therapy groups in the relationship between patients’ 
Content and their Empathy scores. Thus our final model will include Content mod- 
eled as an effect that is fixed across therapy groups such that level two, we model: 


Boj = Yoo + YoiTx; + uoj 
Bij = 10 


The fixed effects estimates were: 


Final estimation of fixed effects: 


Standard Approx. P- 
Fixed Effect Coefficient Error T-ratio ағ value 
For INTRCPT1, BO 
INTRCPT2, G00 22,8 (S088 1 240052 13252 8 0.000 
TX, GOT =5,49618e 1.996275 =3 060 8 DOT 
For CON slope, B1 
INTRCPT2, G10 0.341875 0.073204 4.670 37 0.000 


These parameter estimates can be substituted into the level two formulation: 


Bo; =:22.77 —5.50Txj T uoj 
Bi; = 0.34 


To facilitate interpretation of the results, it can help to obtain the single equation 
(by substituting the level two equations for Boj and 81) into the level one equation to 
obtain: 


Етр = 22.T1—5.50Tx; + 0.34Сопіепі; + rij + uoj 


and then (as can be done with simple regression), values for the predictors can be 
substituted into this single equation. For example, substituting the relevant values 
into the single multilevel equation, the combinations of 7x and Content scores re- 
sults in the predicted Empathy scores that appear in the following table: 
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Те= 0 Tx=1 
Content = 0 22.77 17.27 
Content = 1 23.11 17.61 
Content = 2 23.45 17.95 


Thus, for example, the value for Yoo represents the predicted Emp score when Tx 
= 0 апа for someone at the mean on Content (i.e. for someone for whom Content = 
0). The Tx coefficient represents the treatment’s effect on Emp controlling for Con- 
tent levels. In other words, for two participants with the same Content score, one of 
whom is in the 7x = 0 group while the other is in the 7x = 1 group, there will be a 
predicted difference of 5.5 points on the Emp scale (with the difference favoring 
the Tx = 0 member). In the table above, the difference for two people with Content 
= 0 is 22.77 — 17.27 = 5.5. Similarly for two people with Content = 2 (2 points 
above the mean on Content), the predicted difference for those in 7x = 0 versus Tx 
= 115 23.45-17.95. Lastly, the Content coefficient indicates that for two patients in 
the same 7x group, a difference of one on the Content scale is associated with an 
Emp score predicted to be .34 points higher. In other words, controlling for the 
treatment effect, the more contented a patient, the better their Empathy is antici- 
pated to be. Thus we see in the table that for two people in the 7x = 0 group, one 
with Content = 1 and the other patient with Content = 0, the difference in their pre- 
dicted Emp scores is: 23.11 — 22.77 = 0.34. Similarly for two people in Tx = 1, one 
with Content = 2, the other with Content = 1, the predicted difference in Emp is: 
17.95 — 17.61 


8.11 FINAL COMMENTS ON HLM 


It should be emphasized that this chapter has provided only a very brief introduc- 
tion to multilevel modeling and the use of HLM software to estimate the model pa- 
rameters. It should be also be noted that despite the ease with which researchers 
can use software such as HLM to estimate their multilevel models, it behooves the 
user to ensure that they understand the model being estimated, how to interpret the 
resulting parameter estimates and associated significance tests as well as the ap- 
propriateness of the assumptions made. While not demonstrated in this chapter, 
because this is an introductory treatment, a residual file can be easily created. As 
Raudenbush et al. note on p. 13 of the HLMS5 manual and repeat on p. 15 of the 
HLM6 manual: 


"After fitting a hierarchical model, it is wise to check the tenability of the assump- 
tions underlying the model: 
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Are the distributional assumptions realistic? 
Are results likely to be affected by outliers or influential observations? 
Have important variables been omitted or nonlinear relationships been ignored?” 


HLM software can be used to provide the residuals for models estimated. 

Aside from HLM, there are several other software programs that can be used to 
estimate multilevel models including MLwiN (Goldstein, et al., 1998), SAS Proc 
Mixed (Littell, Milliken, Stroup, & Wolfinger, 1996; see Singer (1998) for a 
well-written introductory article describing use of PROC MIXED for multilevel 
modeling) and VARCL (Longford, 1988) among others. Even the latest versions of 
SPSS include some basic hierarchical modeling estimation routines. Kreft and De 
Leeuw (1998) provide some good descriptions of the available multilevel pro- 
grams as well as website references for the interested user. 

The list of multilevel textbooks provided earlier in the chapter can provide the 
reader with more detailed worked examples as well as fuller descriptions of the es- 
timation used and the assumptions made when analyzing these multilevel models. 
In addition, the texts provide excellent resources for the reader to find out about 
more advanced multilevel modeling techniques including models with dichoto- 
mous or ordinal outcomes, models with multivariate outcomes, meta-analytic 
models, and models for use with cross-classified data structures. 

The same caveats that apply to model-building using simple single-level regres- 
sion analyses apply to model-building with multilevel models. Choosing a final 
model based on resulting estimates from a series of models can lead to selection of 
a model that is very sample-specific. As with any kind of model-fitting, if the ana- 
lyst has a large enough sample, then the data can be randomly divided to provide a 
cross-validation sample to use to test the final model selected based on results from 
the other half of the sample (Hox, 2002). 

It is hoped that researchers continue to become more familiar with the complex- 
ities of multilevel modeling and that they will be increasingly applied for the anal- 
ysis of relevant data structures. 

The data files for the MATH. 12 example follow. There are 135 students and 30 
teachers in this artificial data set. 


2lvI student L1 
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teachid studid gender math 12 iim 

1 1 5 0 94 35 

2 1 7 1 107 35 

3 1 9 0 97 42 

4 2 14 0 92 42 

5 2 16 0 94 39 

6 2 19 0 92 49 

7 2 20 1 105 41 

8 3 21 0 93 40 

9 3 28 1 101 50 
10 3 29 1 113 45 
11 4 32 1 103 42 
12 4 35 1 101 43 
13 4 36 1 108 47 
14 5 47 0 98 37 
15 3 49 0 95 37 
16 3 50 1 99 32 
17 6 52 0 78 33 
18 6 58 0 83 38 
19 6 60 0 81 34 
20 7 61 0 96 40 
21 7 68 0 93 43 
22 7 69 1 99 37 
23 8 72 0 92 38 
24 8 74 0 93 43 
25 8 76 0 77 36 
26 8 78 0 98 43 
27 8 80 1 99 47 
28 9 81 0 95 49 
29 9 82 0 96 47 
30 9 83 0 98 48 
31 9 84 0 93 45 
32 9 90 0 94 49 
33 10 92 1 105 46 
34 10 94 1 104 45 
35 10 96 1 104 45 
36 10 97 1 107 46 
37 10 100 1 110 47 
38 11 101 1 107 39 
39 11 105 1 97 36 
40 11 107 1 101 40 
41 11 108 0 82 36 
42 11 109 0 86 34 
43 12 112 1 100 41 
44 12 115 1 105 91 
45 12 117 0 94 44 

(Continued) 


teachid studid gender math_12 iim 
46 12 118 1 116 53 
47 12 119 1 108 49 
48 12 120 1 99 51 
49 13 123 1 106 41 
50 13 125 1 116 50 
51 13 128 0 96 40 
32 13 129 1 105 44 
53 13 130 1 96 37 
54 14 132 1 105 44 
55 14 133 1 105 42 
56 14 139 1 111 41 
57 14 140 1 112 48 
58 15 141 1 99 41 
59 15 142 0 93 40 
60 15 144 1 106 42 
61 15 145 1 106 44 
62 15 149 1 108 44 
63 16 151 1 106 48 
64 16 154 1 106 43 
65 16 155 1 99 45 
66 16 157 0 91 41 
67 16 158 1 98 45 
68 17 161 0 91 32 
69 17 164 0 83 30 
70 17 165 1 117 48 
71 17 166 1 102 43 
72 17 169 1 116 53 
73 17 170 1 105 46 
74 18 171 0 96 42 
75 18 172 1 104 45 
76 18 174 0 78 33 
77 18 175 0 94 43 
78 18 176 0 95 37 
79 18 177 0 81 33 
80 18 178 0 88 36 
81 18 179 0 96 37 
82 18 180 0 95 35 
83 19 181 0 96 45 
84 19 183 0 91 45 
85 19 185 0 93 48 
86 19 187 0 87 44 
87 19 188 0 83 51 
88 19 190 0 83 43 
89 20 194 1 101 43 
90 20 199 0 85 37 
(Continued) 
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teachid studid gender math_12 iim 

91 20 200 0 92 38 

92 21 205 1 101 45 

93 21 206 1 104 40 

94 21 207 1 100 44 

95 21 208 1 101 47 

96 21 210 1 102 41 

97 22 212 1 113 44 

98 22 214 1 100 39 

99 22 216 0 97 49 
100 22 218 1 109 35 
101 22 219 1 111 42 
102 22 220 1 99 37 
103 23 221 0 92 42 
104 23 223 0 90 33 
105 23 224 0 95 37 
106 23 225 1 105 41 
107 24 233 0 98 38 
108 24 234 1 103 43 
109 24 235 1 101 42 
110 24 236 0 96 37 
111 24 238 1 102 40 
112 25 243 0 90 42 
113 25 246 0 90 45 
114 25 247 0 88 41 
115 25 249 0 95 42 
116 26 254 0 96 45 
117 26 256 1 101 43 
118 26 258 1 107 44 
119 26 259 1 104 40 
120 26 260 1 99 42 
121 27 261 1 108 46 
122 27 263 0 85 38 
123 27 266 0 91 39 
124 27 267 0 89 38 
125 27 270 1 97 38 
126 28 271 1 108 56 
127 28 272 1 97 41 
128 28 273 1 111 49 
129 28 274 1 105 50 
130 28 280 1 109 45 
131 29 287 0 93 40 
132 29 290 0 82 30 
133 30 297 1 108 45 
134 30 298 1 104 38 
135 30 299 0 84 24 


2lvl_class_L2 


teachid resource 
1 1 7 
2 2 7 
3 3 5 
4 4 7 
5 5 7 
6 6 5 
7 7 5 
8 8 3 
9 9 5 
10 10 6 
11 11 5 
12 12 6 
13 13 5 
14 14 6 
15 15 5 
16 16 5 
17 17 6 
18 18 7 
19 19 4 
20 20 6 
21 21 6 
22. 22 5 
23 23 5 
24 24 6 
25 25 5 
26 26 6 
27 27 6 
28 28 5 
29 29 6 
30 30 7 
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Appendix A 


Data Sets 


CONTENTS 

A.1 Clinical Data 

A.2 Alcoholics Data 

A.3 Sesame Street Data 

A.4 Headache Data 

A.5 Cartoon Data 

A.6 Attitude Data 

A.7 National Academy of Sciences Data 
A.8 Agresti Home Sales Data 


CLINICAL DATA 


The data for this study was drawn from the archives of Children's Hospital Medi- 
cal Center's Department of Psychology in Cincinnati, Ohio. Thirty seven subjects 
were eventually selected from each of three diagnostic groups: 


1. 


27 
3. 


Encopretic children: These children have problems with fecal soiling. 
Clinically, parents report that the child “forgets,” is too engrossed in other 
activities, or delays in going to the toilet. Problems that have been found to 
be associated with encopresis include anxiousness, social withdrawal, mo- 
tor integration, and attention deficit. 

Hyperactive children. 

General clinic group: adjustment disorder, disturbed. 


The selection criteria for the subjects were: male, 7-14 years of age, Full Scale 
WISC-R score above 85, and an absence of any concurrent disability or condition 
that could account for bowel control or attention problems. 

Factor scores on the WISC-R were used to assess cognitive development and at- 
tention/information intake processes. The factor scores are for verbal comprehen- 
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sion (VERBCOMP), perceptual organization (PERCORG), and freedom from 
distractability (FREEDIST). 


CLINICAL DATA 


V F V Е 

Е Р R E P R 

R Е E R E E 

B R E B R E 
G C [o D G [o [o D 
P [9] [9] I P [9] [9] I 
I M R S I M R S 
D P G T D P G T 
1 7.25 11.00 9.00 2 1:13 11.50 8.33 
1 13.50 13.75 9.67 2 9.00 8.75 8.67 
1 8.00 8.25 6.67 2 7.50 9.00 7.33 
1 8.50 11.75 9.00 2 8.00 9.75 5.33 
1 6.25 12.75 7.67 2 14.00 10.25 11.00 
1 12.50 14.00 10.33 2 10.00 11.50 8.00 
1 11.00 13.25 8.67 2 11.75 6.50 8.33 
1 11.25 10.75 9.67 2 9.75 9.25 8.67 
1 11.75 13.75 14.67 2 10.00 8.75 7.67 
1 7.50 12.50 6.33 2 10.50 13.50 12.00 
1 8.00 11.00 7.67 2 9.25 8.25 8.33 
1 10.25 11.50 7.00 2 9.75 10.25 6.00 
1 9.75 13.25 8.00 2 11.50 9.00 11.33 
1 9.00 11.00 7.33 2 12.50 12.50 9.67 
1 10.00 10.50 10.00 2 13.75 12.50 11.00 
1 12.00 15.25 12.67 2 11.00 11.25 9.33 
1 9.50 8.50 6.00 2 11.00 8.25 10.00 
1 13.25 10.75 8.67 2 9.00 11.25 9.67 
1 10.50 11.50 5.33 3 8.50 8.00 9.00 
1 10.50 9.25 8.67 3 9.50 10.50 6.67 
1 10.50 11.75 7.67 3 8.00 10.25 9.67 
1 8.25 11.00 5.67 3 9.25 11.50 5.33 
1 TAS 9.25 8.00 3 9:25 9.75 5:33 
1 12.75 13.75 12.00 3 15.50 12.50 14.00 
1 8.75 9.75 9.00 3 9.25 11.00 10.00 
1 7.75 8.75 5.00 3 8.25 10.50 7.00 
1 9.50 12.00 7.00 3 11.50 13.00 8.00 
1 10.50 11.50 5.33 3 9.25 10.75 8.67 
1 15.50 14.00 8.00 3 10.00 10.50 9.33 
1 9.75 9.75 6.33 3 12.50 12.00 8.33 
1 9.50 12.50 9.33 3 10.00 13.25 9.67 
1 8.00 7.50 5.00 3 10.75 12.50 9.00 
1 14.75 12.25 8.33 3 11.25 13.50 8.00 
1 9.25 9.50 9.00 3 12.25 10.75 9.00 


(Continued) 
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CLINICAL DATA (Continued) 
у Е у Е 
E Р R E Р R 
R E E R E E 
B R E B R E 
G C C D G C C D 
P О О I Р О О I 
I M R 5 І M R 5 
D Р а T D Р а T 
І 9.25 11.75 7.33 3 16.00 13.25 12.67 
1 8.50 9.50 8.00 3 12.25 13.75 14.00 
1 12.25 10.75 9.00 3 11.00 11.00 9.67 
2 9.75 10.25 11.00 3 12.25 12.00 10.33 
2 9.50 11.75 11.33 3 6.50 8.25 10.33 
2 10.75 12.00 9.67 3 8.75 9.75 10.33 
2 9.00 11.50 7.67 3 14.25 16.00 16.33 
2 11.25 13.75 12.33 3 6.00 10.00 7.67 
2 12.25 11.75 12.00 3 14.00 13.50 9.00 
2 12.00 8.75 5.67 3 12.00 9.25 8.00 
2 11.00 10.25 7.00 3 10.00 9.50 10.33 
2 10.25 8.75 6.33 3 12.50 12.75 9.67 
2 12.50 13.75 9.00 3 11.50 12.75 10.33 
2 10.25 13.25 8.00 3 10.75 10.75 8.33 
2 12.75 13.25 11.67 3 9.00 10.75 11.33 
2 6.75 9.50 7.33 3 11.50 13.00 8.00 
2 9.00 10.00 8.00 3 11.25 14.25 11.33 
2 9.25 9.75 10.00 3 10.00 10.50 9.33 
2 13.75 13.25 7.33 3 11.25 12.50 13.00 
2 9.00 8.50 11.33 3 8.00 10.25 9.67 
2 9.75 7.25 8.67 3 9.75 10.50 9.67 
2 8.25 10.25 7.33 


368 APPENDIX A 


DESCRIPTION OF ALCOHOLICS DATA SET 


This is data from a study into the causes of relapse among alcoholics conducted at 
an Addiction Research Unit in London, England. The 251 subjects are those who 
presented themselves for treatment of alcoholism at several hospitals and related 
agencies. The subjects were divided into three groups; 


Group 1: those never having previously experienced relapse after trying to 
give up heavy drinking. 

Group 2: those who claimed to have relapsed, but no more than two or three 
times. 

Group 3: those who had a longer history of relapse of four or more times. 


The dependent variables for the study came from a Relapse Inventory Precipi- 
tants Inventory, which the subjects were asked to fill out at the point of admission 
into treatment. This inventory was developed from work on a previous survey that 
had identified three areas of vulnerability to relapse, the measurement of which 
yielded the following dependent variables: 


UNPLMOOD—unpleasant mood states; for example, depression. 
EUPHORIC—euphoric states and related situations; for example, celebrations 
and parties. 

LESSVIGL—an area designated as lessened vigilance, for example, а tempta- 
tion to believe that one or more drinks would cause no problem. 


The grouping variable is given first for each subject, with UNPLMOOD, 
EUPHORIC, and LESSVIGL in that order. 
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ALCOHOLICS DATA 


Grp UM ES LV Grp UM ES LV 
2 27 0 3 2 12 14 4 
1 37 19 6 1 31 13 9 
2 2 2 6 2 2 3 3 
3 24 15 3 3 3 4 0 
1 34 13 4 1 26 9 9 
1 37 10 7 1 33 14 9 
1 34 23 9 1 18 7 6 
3 7 9 7 1 16 8 9 
3 10 0 0 2 13 9 7 
3 6 7 6 1 15 12 3 
2 30 16 9 1 25 18 6 
3 24 21 8 1 23 П 4 
1 22 6 8 1 26 6 9 
1 24 7 7 1 18 0 2 
1 21 18 6 2 6 6 4 
1 0 4 3 1 19 13 6 
1 14 7 2 1 34 23 9 
2 14 13 0 1 26 21 7 
1 10 13 7 2 28 22 7 
1 13 18 9 2 36 21 9 
3 41 16 9 1 19 16 9 
1 24 15 3 1 32 18 8 
2 1 6 3 2 3 5 4 
2 15 16 7 1 9 16 9 
2 6 2 0 2 0 6 7 
1 34 16 6 1 8 4 5 
1 37 19 9 1 42 22 9 
2 2 4 0 1 41 21 8 
1 28 6 9 2 16 10 2 
2 21 15 8 2 7 12 4 
3 13 3 3 2 36 П 7 
1 32 9 8 3 18 10 9 
2 25 0 9 1 24 9 8 
3 0 0 0 1 37 19 9 
2 27 13 4 2 3 2 0 
1 12 10 4 1 16 12 8 
2 П П 7 2 14 5 7 
1 30 13 6 1 35 16 6 
2 32 19 5 1 35 15 7 
1 9 7 1 1 34 21 9 
1 32 24 9 1 34 23 7 
2 31 18 5 2 31 7 9 
3 26 16 3 1 35 10 5 
3 0 2 1 1 24 7 6 
1 5 3 4 1 29 16 9 


(Continued) 
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ALCOHOLICS DATA 


Grp UM ES LV Grp UM ES LV 
1 15 5 3 1 17 3 9 
1 29 18 4 1 11 12 6 
1 28 21 9 2 39 23 9 
2 16 14 9 1 21 10 9 
3 15 5 8 1 11 3 4 
1 27 15 7 2 33 17 9 
2 21 12 Э 2 0 0 0 
1 42 23 9 2 31 14 7 
2 21 12 9 1 18 14 2 
1 16 0 3 3 41 12 9 
1 26 0 9 1 5 10 9 
1 20 2 5 3 30 12 6 
1 1 1 1 2 3 9 6 
3 15 23 9 2, 9 8 2 
1 16 13 3 1 17 20 7 
3 0 10 3 2 15 3 6 
2 22 10 5 1 28 21 6 
3 20 6 1 1 14 10 2 
3 14 3 4 2 13 14 7 
1 7 E] 3 2 0 0 0 
3 14 12 4 2 0 6 2 
2 14 1 4 1 6 1 2 
2 40 5 4 1 35 16 9 
1 24 9 7 1 23 3 8 
2 15 5 5 3 10 5 6 
1 8 17 6 1 29 20 9 
2 5 4 3 1 23 12 6 
1 25 1 9 2 4 0 2 
2 17 2 9 1 11 13 8 
1 17 18 8 2 34 16 7 
1 26 18 8 3 0 0 0 
1 29 10 6 1 26 5 8 
1 26 15 9 3 15 15 5 

1 3 5 2 
2 20 14 9 3 34 11 7 
1 10 17 9 2 6 19 6 
3 31 20 6 1 18 12 5 
1 18 7 3 1 3 5 7 
1 22 13 4 1 35 15 9 
2 21 9 6 2 17 5 5 
2 3 7 4 2 29 16 7 
2 16 1 0 2 32 20 8 
1 15 12 7 1 13 5 3 
3 7 У 1 2 37 22 9 
2 18 12 4 1 26 13 6 


(Continued) 
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ALCOHOLICS DATA (Continued) 

Grp UM ES LV Grp UM ES LV 
1 27 11 9 1 17 9 4 
І 18 9 8 1 28 10 5 
2 0 0 0 3 28 8 6 
І 8 14 6 3 5 3 0 
2 7 7 6 2 25 19 8 
І 7 4 5 3 19 23 3 
2 19 21 8 2 33 14 5 
2 30 16 8 2 7 17 4 
3 6 16 8 1 19 13 6 
1 28 13 9 1 32 13 9 
І 17 8 3 3 0 3 0 
2 17 13 6 
1 4 4 3 2 16 18 2 
2 12 16 6 1 39 18 9 
І 2 1 3 1 28 20 3 
2 40 4 3 1 24 16 8 
1 18 9 8 2 24 14 6 
2 11 11 6 2 19 5 7 
1 11 21 9 3 33 19 3 

1 15 10 7 
2 23 12 7 1 24 16 7 
1 32 20 9 2 33 14 7 
2 14 3 3 1 26 3 5 
2 10 6 4 2 26 12 8 
3 32 7 4 1 26 16 6 

2 40 20 9 
2 22 19 5 3 33 17 7 
І 0 0 0 2 30 13 8 
2 29 16 6 2 22 20 7 
1 0 10 8 1 21 15 6 
1 23 9 7 2 17 9 6 
1 37 22 9 1 28 23 9 
2 25 17 6 1 15 17 9 
1 17 10 3 3 11 5 6 
2 18 3 2 3 39 15 6 
2 19 22 8 2 30 17 9 
1 16 4 5 2 22 11 3 
2 0 2 2 
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DESCRIPTION OF SESAME STREET DATA BASE 


This data is part of a large data set that evaluated the impact of the first year of the 
Sesame Street television series. Sesame Street was concerned mainly with teach- 
ing preschool related skills to children in the 3-5 year age range, with special em- 
phasis on reaching 4 year old disadvantaged children. The format of the show was 
designed to hold young children’s attention through action oriented, short duration 
presentations teaching specific preschool cognitive skills and some social skills. 
Each show was one hour and involved much repetition of concepts within and 
across shows. 

A main concern for the evaluation, which was carried out at Educational 
Testing Service, was that it would permit generalization to the populations of chil- 
dren of most interest to the producers of the program (the Children’s Television 
Workshop). Five populations were of interest: 


1. Three to five year old disadvantaged children from inner city areas in vari- 
ous parts of the country. 

. Four year old advantaged suburban children. 

. Advantaged rural children. 

. Disadvantaged rural children. 

. Disadvantaged Spanish speaking children. 


UA > оо P2 


Children representative of these populations were sampled from five different 
sites in the United States. 

Both before and after viewing the series the children were tested on a variety of 
cognitive variables (variables 8 through 19 in the data set), including knowledge of 
body parts, knowledge about letters, knowledge about numbers, etc. 

The variables are arranged on the file as follows: 


DATASETS 373 


Variable No. Variable Name Description 
1 ID Subject identification number 
2 SITE Five different sampling sites coded as 1,2,3,4 or 5. 
3 SEX Male-1, Female—2 
4 АСЕ in months 
5 VIEWCAT Viewing categories coded as a | if children rarely watched the 
show to a 4 if the children watched the show on average of more 
than 5 times a week 
6 SETTING Setting in which Sesame Street was viewed, coded as | for 
home and coded as 2 for school 
7 VIEWENC A treatment condition in which some children were encouraged 
to view Sesame St (code-1) and others were not (code-2) 
8 PREBODY pretest on knowledge about body parts (maximum score—32)— 
naming and functions of body parts 
9 PRELET pretest on knowledge about letters (maximum score-58)— 
including recognizing letters, naming capital letters, matching 
letters in words 
10 PREFORM pretest on knowledge about forms (maximum score-20)— 
recognizing and naming forms 
11 PRENUMB pretest on knowledge about numbers (maximum score—54)— 
recognizing and naming numbers, counting, addition and 
subtraction 
12 PRERELAT pretest on knowledge of relational terms (maximum 
score-17)—amount, size and position relationship 
13 PRECLASF pretest on knowledge of classification skills (maximum 
score-24)—classifying by size, form, number and function 
14 POSTBODY posttest knowledge on body parts 
15 POSTLET posttest knowledge of letters 
16 POSTFORM posttest knowledge of forms 
17 POSTNUMB posttest knowledge of numbers 
18 POSTREL posttest knowledge of relations 
19 POSTCLAS posttest knowledge of classification skills 
20 PEABODY Mental age scores obtained from administration of the Peabody 


Picture Vocabulary Test as a pretest measure of vocabulary 
maturity 
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SESAME STREET DATA 


P P. oP p. P P. 

V S V P РР RRO P ОО Р.О P 

I E I R P R R E E 8 O 5 S О S E 

E T E E R E E К С Т S T Т S T A 

5 w T W B E F МЕ L B ТЕ М T С B 

1 5 А С I Е О L О ULL A OL OU к L о 

I T E С А ММ DERM А 5 D Е К М E A D 
D E X E T G C Y T M B ТЕ Y T M BL S Y 
1 1 1 66 1 2 1 16 23 12 40 14 20 18 30 14 44 14 23 62 
2 1 2 67 3 2 |» 3026 9 39 16 22 30 37 17 39 14 22 80 
3 1 1 56 3 2 2 22 14 9 9 9 8 21 46 15 40 9 19 32 
4 1 1 49 1 2 2 23 11 10 14 9 13 21 14 13 19 8 15 27 
5 1 1 69 4 2 2 32 47 15 51 17 22 32 53 18 54 14 21 71 
6 1 2 54 3 2 2 29 26 10 33 14 14 27 36 14 39 16 24 32 
7 1 2 47 3 2 2 23 12 11 13 11 12 22 45 12 44 12 15 28 
81 1 51 2 2 1 32 48 19 52 15 23 31 47 18 51 17 23 38 
9 1 1 69 4 2 1 27 44 18 42 15 20 32 50 17 48 14 24 49 
10 1 2 53 3 2 1 30 38 17 31 10 17 32 52 19 52 17 24 32 
11 1 2 58 2 2 2 25 48 14 38 16 18 26 52 15 42 10 17 43 
12 1 2 58 4 2 2 21 25 13 29 16 21 17 29 15 40 10 19 58 
13 1 2 49 | 2 2 28 8 9 13 8 12 20 16 9 18 10 13 39 
14 1 1 64 2 2 1 26 11 15 21 10 15 26 28 15 35 16 14 43 
15 1 2 58 2 2 1 23 15 9 16 9 11 28 21 10 22 10 17 56 
1 1 1 49 3 2 1 25 12 17 24 12 18 28 45 14 45 13 21 37 
17 1 1 57 2 2 1 25 15 13 16 10 18 25 24 16 28 8 18 43 
18 1 1 45 4 2 1 16 12 8 1 6 3 25 16 1) 17 9 9 29 
19 1 1 45 3 2 1 25 16 12 23 10 13 32 46 18 35 14 19 45 
20 1 1 60 3 2 2 19 19 8 23 M 10 28 50 12 38 12 13 51 
21 1 2 6 4 2 1 29 24 14 41 10 23 29 48 20 51 15 24 55 
22 1 1 44 4 1 1 25 15 17 22 11 16 32 42 19 45 15 19 49 
23 1 2 38 3 1 1 209 2 7 8 9 22 23 17 19 15 14 31 
24 1 1 35 4 1 1 11 6 8 16 8 9 22 27 16 20 14 15 40 
25 1 2 42 2 1 1 15 7 8 П 12 7 14 13 7 21 12 10 48 
26 1 2 50 2 1 1 26 14 10 36 ІЗ 17 25 18 14 42 13 18 35 
27 1 1 61 4 2 2 28 42 16 40 16 11 24 27 15 20 14 14 62 
28 1 2 34 4 1 1 1] 103 7 10 5 1 21 107 13 13 10 15 42 
29 1 1 60 3 1 2 23 13 9 23 11 14 28 20 18 45 14 21 58 
30 1 2 39 2 1 І H 5 5 5 5 1 2 15 13 9 % 12 29 
31 1 10 39 3 1 1 24 4 11 25 11 17 21 11 12 13 9 10 49 
32 1 2 4 2 1 10 24 8 3 14 8 8 21 17 10 18 9 14 30 
33 1 2 55 3 1 1 31 15 17 45 16 24 32 43 18 46 13 21 62 
34 1 2 42 3 | 1 23 11 7 15 6 7 29 27 14 33 9 19 58 
35 1 1 50 4 1 1 18 17 12 28 12 17 29 41 15 48 12 22 55 
36 1 1 58 2 1 1 13 12 6 10 6 11 29 23 11 27 8 10 33 
37 1 2 59 3 1 2 27 7 13 23 12 10 32 39 18 49 16 19 55 
38 1 2 36 1 1 2 11 12 9 5 5 3 12 12 6 13 9 6 27 
39 1 1 51 2 1.) 1 32 16 16 34 15 17 21 17 6 21 8 12 62 


(Continued) 
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SESAME STREET DATA (Continued) 


P Р P Р. GP P 

V S V P P РЕ косо ва P 

I E I К P R R E E 85 О 8 S О 8 E 

E T E E R E E R С Т S T T S T A 

S w T W B E Е МЕ L B ТЕ М T C B 

I S A C I EOL O U L AOL Q U R L о 

I T E С А ММ DERM А 5 D Е К М E A D 
D E X E T G C Y T M B ТЕ Y T M BL S Y 
40 1 1 51 2 1| 1 31 18 13 33 15 14 21 16 11 7 9 11 58 
41 1 1 48 3 1 1| 13 14 8 8 10 1 21 22 10 28 8 19 34 
42 1 1 43 2 1 1 17 13 14 13 1 14 24 19 12 22 11 20 32 
43 1 2 35 3 1 2 23 12 8 9 5 5 29 11 8 9 п 11 32 
44 1 2 36 1 1 2 1 2 6 5 2 4 21 6 П 6 6 7 28 
45 1 2 39 2 1 2 20 18 6 4 4 6 19 8 11 22 10 10 29 
46 1 1 45 4 1 2 14 13 9 16 9 12 29 48 17 48 14 19 35 
47 1 1 58 3 1 2 30 38 15 45 14 18 32 48 19 46 14 23 67 
48 1 2 38 3 1 1 13 10 7 8 5 7 26 36 14 20 10 16 29 
49 1 2 57 4 1 1 26 15 11 22 10 15 24 20 18 28 12 18 35 
50 1 10 49 3 1 2 26 35 10 47 13 17 26 13 7 12 11 14 67 
51 1 10 55 1 2 2 24 11 10 18 8 10 20 10 14 23 10 10 39 
521 2 44 4 2 2 25 39 18 41 9 21 30 47 20 50 11 23 90 
53 1 1 56 1 2 2 13 11 6 15 10 11 15 13 9 13 10 6 46 
54 1 2 48 2 2 2 17 11 9 14 8 9 26 32 15 27 11 11 39 
55 1 1 50 2 2 2 16 10 8 9 7 6 21 15 17 17 12 15 34 
56 1 10 52 1 2 2 16 15 6 B 11 B 19 14 14 18 7 16 38 
57 1 2 51 2 2 1 24 14 10 20 10 17 28 21 17 36 19 17 34 
58 1 2 58 1 2 2 25 17 17 23 14 15 25 16 19 28 4 20 37 
59 1 2 48 3 2 2 13 10 10 13 9 7 27 15 14 23 | 12 33 
60 1 1 54 4 2 2 16 13 10 10 9 7 19 20 14 19 12 11 36 
61 2 10 52 4 2 2 20 35 15 21 8 20 27 48 19 47 15 22 49 
62 2 2 48 1 2 2 20 12 П 13 7 11 28 19 17 1 1 17 53 
63 2 1 55 4 2 2 28 13 10 29 11 12 30 31 19 45 15 22 65 
64 2 1 55 2 2 2 23 16 5 32 10 13 28 28 15 46 13 17 55 
65 2 2 55 2 1 2 30 27 1 39 12 17 32 40 19 52 17 23 85 
66 2 2 56 1 1 1 26 18 10 30 12 9 29 46 13 44 13 17 43 
67 2 2 50 3 2 1 31 15 10 24 11 20 31 43 18 52 14 23 65 
68 2 1» 51 3 2 1 28 19 14 37 13 15 32 47 19 48 15 22 75 
69 2 1 58 3 2 2 30 14 17 37 13 20 28 38 15 36 16 18 85 
70 2 2 55 4 2 2 27 10 10 12 14 #16 31 42 16 31 13 20 40 
71 2 2 4 4 1 2 24 22 14 42 14 21 30 49 19 50 14 22 58 
72 2 2 51 4 1 2 20 13 П 22 П 15 27 17 11 20 14 14 42 
73 2 1 52 4 2 1 29 13 9 18 10 10 30 23 17 28 12 17 69 
74 2 2 54 3 2 1 30 26 13 23 10 17 32 42 12 37 12 19 58 
75 2 1 47 4 1 1 19 12 11 16 11 12 28 43 18 38 14 17 37 
76 2 1 50 4 1 1 31 30 15 47 15 19 23 48 19 49 13 20 62 
77 2 2 55 4 1 1 31 13 18 39 15 24 31 51 19 50 14 23 99 
78 2 1 50 3 2 1 32 27 13 29 13 12 31 45 19 53 16 23 75 


(Continued) 
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SESAME STREET DATA 


P P. P p. P P. 
V 5 У P P PR R ОР ОО PO Р 
I E I R P R R E E S O S 8 О 8 E 
E T E E К E E К С Т S T T S T A 
S w T W B E F N E L B ТЕ М T C B 
I S A C I E О O UL A ОТ, O U R L © 
I T E С А ММ DERM А 5 D ERM E A D 
D E X E T СС Y T M B ТЕ Y T M B Г S Y 
7) 2 2 57 2 2 1 31 19 M 36 ІЗ 11 32 50 17 47 15 22 82 
80 2 2 55 4 1 2 20 12 9 16 11 18 30 30 14 18 12 16 47 
81 2 2 55 4 1 2 26 14 8 24 13 12 30 45 19 43 15 24 62 
8 2 1 50 3 2 1 3044 15 45 12 11 32 53 19 52 15 23 67 
83 2 1 52 3 2 1 26 13 11 34 12 12 30 45 17 43 13 21 40 
84 2 2 45 2 2 1 28 12 9 16 7 8 28 21 14 32 9 17 41 
85 2 2 52 3 2 1 24 14 8 18 8 7 28 43 12 41 13 15 56 
86 2 1 53 4 1 1 26 17 15 32 13 16 27 37 15 31 14 23 73 
87 2 2 53 4 2 2 29 10 17 23 13 15 32 51 19 48 16 21 85 
88 2 2 53 2 2 2 23 12 12 15 11 17 28 20 15 19 8 19 59 
8 2 2 56 3 1 2 28 29 11 43 13 22 31 32 15 40 15 21 58 
90.2 2 54 4 2 2 32 46 15 48 13 21 32 51 19 50 16 23 92 
9 2 10 502 2 1 22 17 8 18 12 14 25 16 14 32 14 17 69 
92 2 1 50 3 1 1 29 25 14 35 15 20 26 36 16 31 8 15 56 
93 2 2 53 4 1 1 25 17 12 30 ІЗ 17 29 40 17 40 13 22 78 
94 2 1 45 4 1 1 21 16 12 15 11 17 29 36 19 29 10 16 58 
95 2 2 56 4 1 1 32 22 15 32 11 13 31 46 20 51 14 24 78 
96 2 1 53 3 1 1 31 14 16 29 11 17 32 43 19 42 13 22 67 
97 2 2 46 4 1 1 22 28 14 20 5 15 32 42 ІЗ 29 15 18 69 
98 2 2 46 2 1 2 30 18 14 23 11 11 29 33 13 36 8 16 53 
99 2 1 50 3 1 2 18 13 8 М ІП 12 19 23 11 31 12 17 55 
1002 2 47 1 1 2 17 10 5 ПІ 8 7 18 19 9 8 10 13 53 
101 2 1 56 2 2 1 27 11 15 22 13 14 30 47 20 45 15 24 67 
102 2 2 46 3 1 1 27 13 B 22 12 13 28 36 17 46 1 20 62 
103 2 2 45 4 1 1 2123 14 27 13 20 31 48 19 44 11 24 59 
104 2 2 46 4 1 1 19 17 4 19 10 13 28 36 16 29 10 15 46 
105 2 2 47 3 1 1 31 9 14 24 11 16 32 42 17 42 11 20 58 
1002 1 32 2 2 1 26 16 12 30 14 16 27 29 20 41 15 21 92 
107 2 2 52 4 2 1 24 15 12 30 14 14 31 45 18 49 13 23 48 
108 2 1 56 2 1 1 2122 12 25 12 15 29 37 14 46 14 18 65 
109 2 1| 48 4 1 1 28 22 13 19 ІІ 8 32 48 18 43 15 19 67 
110 2 1 49 3 1 1 25 16 ІЗ 18 15 15 27 48 13 45 15 18 59 
111 2 2 55 4 2 1 32 8 13 23 11 17 29 35 18 36 ІІ 19 39 
1122 2 45 3 1 1 22 15 1220 9 14 25 21 15 21 9 16 47 
113 2 1 45 3 1 1 32 16 14 30 11 17 25 26 17 39 13 19 51 
14 2 2 48 4 1 1 31 6 8 13 7 13 29 32 15 23 7 16 43 
115 2 1 58 1 1 2 19 14 12 23 11 10 28 15 10 34 ПІ 7 65 
116 3 1 55 2 1 1 20 16 7 14 9 9 21 11 13 20 13 13 39 
117 3 2 48 1 1 2 20 15 5 B 8 7 21 14 3 17 10 7 33 


(Continued) 
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SESAME STREET DATA (Continued) 

Р Р Р P P P 
у S У P P PR КВ О Р О О Р О > Р 
I E I R P R ВЕ E S О S S О S E 
E T E E R E E R C T S T T S T A 
S м T У B E FN EL B ТЕ М T C B 
I S АС I EOL OU L AOL OUR 1, О 
I T E СА N N DERM A S D E R M E A D 
D E X E T с C Y T M B ТЕ Y T M BLS Y 
118 3 2 52 3 1| 1 14 6 3 9 4 8 18 19 20 18 12 20 34 
119 3 2 58 1 1 1 20 11 5 25 9 7 26 16 10 23 12 12 35 
120 3 1 50 3 2 2 13 12 5 11 10 9 21 18 10 19 10 9 32 
121 3 1 58 3 2 2 22 19 11 35 10 17 23 44 18 46 13 19 44 
1223 2 49 4 2 2 14 B 7 5 S 7 17 16 12 19 12 15 31 
123 з 1 56 4 2 2 24 17 9 16 11 10 29 35 17 40 14 19 44 
124 3 1 50 1 1 2 7 14 4 10 5 5 19 15 13 14 13 17 27 
125 3 2 49 1 1 1 20 18 3 14 6 6 11 12 6 8 9 5 29 
126 3 1 4 2 1 1 15 14 8 23 14 12 25 18 14 30 12 16 47 
127 3 1 57 2 1 1 26 14 9 23 15 9 28 15 14 24 12 9 35 
128 3 1 44 1 2 2 12 9 7 14 9 14 13 13 7 17 9 7 32 
129 3 2 4 2 2 2 16 14 9 10 11 9 22 17 9 23 8 13 36 
1303 2 58 4 2 1 17 9 1 28 8 13 29 13 12 29 12 15 38 
131 3 2 60 4 2 1 31 19 1 27 11 16 31 31 17 38 13 22 42 
132 з 2 40 2 1 1 12 14 3 17 6 6 16 13 6 17 10 9 27 
133 3 2 37 2 1 1 7 4 6 4 S 4 13 13 6 14 9 11 60 
1344 3 1 45 1 1 1 12 5 5 9 7 7 15 13 12 20 12 11 28 
135 з 1 60 3 1 1 17 18 9 14 7 6 32 36 13 32 13 12 33 
136 3 1 52 2 1 2 18 13 9 24 10 16 25 15 12 26 11 10 55 
137 3 2 46 4 1 12012 4 17 8 8 28 22 17 38 14 20 29 
138 3 2 60 4 1 1 23 16 9 25 11 14 29 26 17 38 16 22 46 
139 3 1 60 3 1 1 17 11 10 15 10 14 11 13 7 16 9 13 33 
1403 1 59 3 1 1 7 16 11 10 6 10 15 14 9 14 8 10 32 
14 3 2 52 3 1| 10 2920 8 37 18 18 28 46 12 42 18 15 47 
142 з 1 60 3 1 1 29 13 12 17 12 16 29 25 17 32 13 19 90 
143 3 2 56 2 1 1 21 12 12 17 9 14 23 26 16 34 10 20 61 
144 3 2 54 2 2 2 18 28 9 14 12 16 27 42 18 37 11 15 36 
145 3 2 61 3 1 1 13 12 8 16 7 11 28 15 15 18 7 15 35 
146 3 2 61 3 1 1 29 18 12 22 11 14 30 25 17 39 13 19 48 
1447 3 2 51 3 1 1 17 15 5 11 10 11 32 43 14 44 15 21 35 
148 3 1 49 4 2 1 19 17 7 16 3 6 27 27 18 43 15 20 35 
149 3 1 52 2 1 1 22 13 10 20 11 14 22 14 9 21 9 7 55 
150 3 2 55 3 2 1 25 13 12 16 10 14 26 17 6 31 13 9 35 
151 3 2 60 4 2 1 28 10 10 22 12 15 28 15 15 30 11 17 45 
152 3 2 43 1 2 2 14 9 5 15 5 6 16 16 12 14 10 14 33 
153 3 1 55 3 2 1 14 7 9 15 9 12 18 15 16 22 11 16 42 
154 3 2 52 4 2 1 18 11 9 15 8 12 23 23 15 40 13 18 32 
155 3 2 56 1 2 1 26 24 13 25 9 10 17 7 3 13 6 5 40 
156 3 1 56 4 1 1 24 11 1 28 14 17 27 14 15 40 12 21 42 


(Continued) 
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SESAME STREET DATA (Continued) 


P P P p. P Р 
V S V Р P P R R ОР O ОР ОР 
I E I R P R R E E S O S S О S E 
E T E E К E E К С Т S T T 8 T A 
S w T W B E F N E L B ТЕ М T C B 
1 5 А С I Е О L О UL A OL OU к L O 
I T E GA ММ DERM А 5 D Е К М E A D 
D E X E T G C Y T M B ТЕ Y T M B L S Y 
157 3 2 47 2 1 1 20 19 9 25 12 8 26 24 13 35 11 17 46 
158 3 2 56 2 1 1 17 18 8 17 5 9 24 17 10 19 10 15 39 
159 з 2 52 3 1 1 28 15 13 27 9 15 31 16 16 22 12 18 69 
1603 2 51 4 1 1 23 14 1 23 8 11 31 37 17 42 14 18 36 
161 3 1 51 1 1 1 7 13 6 11 7 6 12 8 14 22 10 16 35 
162 3 2 53 3 1 1 15 15 8 18 8 11 29 32 12 28 10 14 34 
163 3 1 50 4 1 1 26 11 14 23 10 11 39 22 16 40 14 17 32 
164 3 2 59 4 1 1 16 10 8 21 9 12 25 22 14 31 10 18 38 
165 3 1 53 3 1 1 14 12 7 9 9 5 22 28 9 30 9 10 32 
166 3 1 55 3 1 1 15 10 7 9 6 11 24 20 14 27 9 9 34 
167 3 1 57 1 1| 1 6 13 2 8 7 18 6 4 0 1 4 35 
168 3 1 58 2 1 1 16 5 5 8 6 9 13 14 1 11 9 16 34 
169 3 1 44 3 1 1 10 12 4 9 10 11 ІЗ 15 3 8 3 5 28 
1703 1 39 1 1 2 1412 4 5 7 5 13 11 8 19 10 8 29 
171] 3 1 53 4 2 1 21 17 12 16 10 13 27 20 14 29 15 16 37 
172 3 2 52 4 1 10 23 10 9 9 7 6 21 16 11 20 9 9 32 
173 3 1 57 3 1 2 25 11 10 19 11 13 28 29 20 25 16 19 35 
174 3 2 40 3 1 1 11 10 7 14 4 8 16 22 11 21 9 9 35 
175 3 2 47 2 1 1 16 B 7 7 6 9 22 13 4 18 П 9 32 
176 3 1 51 2 1 1 25 19 1 24 12 8 26 20 15 24 11 13 47 
177 3 1 48 2 1 1 11 7 4 14 3 13 П 12 8 27 11 10 35 
178 3 2 49 1 1 2 15 16 6 9 4 7 20 16 7 17 10 5 35 
179 3 1 50 2 1 1 R 8 5 17 8 10 18 19 12 13 12 17 30 
180 4 2 53 1 2 2 10 13 4 13 7 8 19 16 9 16 7 11 35 
181 4 2 52 1 2 2 13 15 8 19 8 9 21 11 8 16 7 ІІ 39 
182 4 1 51 1 2 2 19 12 9 17 8 12 27 16 12 27 11 16 39 
183 4 1 52 1 2 2 20 16 12 22 11 17 25 #19 14 26 11 15 36 
184 4 1 46 1 2 2 13 3 3 1 4 4 24 11 10 13 10 13 27 
185 4 2 51 1 2 2 21 19 12 25 13 14 24 15 11 25 8 14 45 
186 4 2 47 1 2 2 19 12 13 27 8 11 24 14 15 21 7 13 28 
187 4 2 51 3 2 1 25 13 12 21 12 16 31 16 15 25 11 18 40 
188 4 2 54 1 2 1 8 20 5 8 7 6 14 13 11 11 8 10 47 
189 4 2 54 2 2 1 12 4 9 4 7 6 17 13 10 12 9 8 36 
190.4 1 57 2 2 1 24 11 10 28 12 11 30 24 18 26 14 16 39 
191 4 1 53 2 2 1 17 12 8 9 5 11 26 ІЗ 11 20 10 15 39 
19 4 2 50 2 2 1 20 16 8 18 9 13 28 25 15 15 9 12 43 
193 4 2 57 1 2 2 28 23 16 33 14 ІІ 26 25 16 42 14 11 69 
194 4 2 58 1 2 2 31 30 12 44 14 17 32 43 16 44 11 13 69 
195 4 2 58 1 2 2 28 29 9 33 14 8 29 44 9 44 15 10 38 


(Continued) 
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SESAME STREET DATA (Continued) 


Р Р P Р. GP P 

V S V P P РЕ К ОР Q ОР ОР 

I E I R P R R E E S О S S О S E 

E T E E R E E R С Т S T T S T A 
S w ТУ B E Е МЕ L B T F М T C B 
1 S A C 1I EOL O UL AOL QU къ О 
T E С А М М D ERM А S D ERM E A D 
E X E T GC Y T M B ТЕ Y T M BL S Y 
4 2 53 1 2 2 19 19 14 24 11 16 21 13 9 31 10 16 39 
4 2 49 | 2 2 20 17 7 13 9 10 30 15 6 21 10 9 30 
4 2 51 1 2 2 10 1 2 2 4 0 13 0 0 0 0 0 34 
4 1 58 1 2 2 22 B 9 13 10 9 18 18 11 13 ІП 8 36 
4 2 51 1 2 2 18 12 4 10 5 9 17 10 8 14 5 10 48 
4 2 53 1 2 2 21 17 9 18 9 11 28 15 9 19 12 9 49 
4 2 56 3 2 1 29 17 17 32 10 20 30 33 17 38 12 20 49 
4 2 51 3 1 1 19 11 10 19 8 7 22 19 11 39 11 21 37 
4 1 47 1 1 1 23 12 11 14 11 13 29 15 13 22 14 16 45 
4 2 54 4 1 1 23 14 12 23 8 15 28 41 16 35 16 22 46 
4 1 54 4 1 1 17 15 6 15 4 1 24 30 ІЗ 42 17 20 32 
4 2 46 1 1 1 22 14 7 15 3 14 29 24 18 36 13 23 30 
4 2 52 2 1 1 20 15 14 19 5 13 27 45 16 38 17 22 35 
4 2 48 1 1 1 24 18 5 21 9 11 23 17 10 16 15 9 36 
4 1 49 2 1 2 17 21 7 23 9 4 13 14 13 35 15 13 45 
4 1 58 1 1 2 14 7 3 17 13 6 22 15 11 23 13 9 59 
4 1 46 3 1 2 18 13 10 11 7 9 22 14 13 23 10 13 42 
4 1 57 1 1 2 27 19 П 20 10 15 27 19 8 29 23 11 41 
4 1 48 4 1 2 27 12 15 23 11 16 27 17 13 27 13 10 39 
4 2 52 2 1 2 23 8 9 16 12 8 20 16 13 23 10 12 94 
4 1 57 2 1 1 29 17 12 24 12 16 31 32 12 17 9 10 55 
4 1 46 3 1 1 18 9 10 12 9 14 24 18 П 13 8 10 28 
4 1 55 2 1 1 14 12 8 14 11 1 27 40 18 35 16 21 32 
4 1 4 3 1 1 8 1] 6 10 6 9 31 23 15 18 10 13 28 
4 1 56 1 2 2 26 15 12 29 11 12 25 23 ІЗ 40 12 16 59 
4 2 44 2 1 1 14 14 12 10 10 12 25 23 16 26 13 12 27 
4 1 5 4 | 1 28 17 12 27 10 11 31 46 19 29 16 20 61 
5 1 48 2 1 1 16 8 8 Ө 8 8 24 1 9 IH 8 7 35 
5 1 56 2 1 1 22 17 11 23 14 13 30 20 17 38 ІЗ 21 58 
5 2 58 2 1 1 20 18 8 26 11 10 30 44 12 40 ІЗ 21 59 
5 2 53 1 1 1 15 11 2 8 10 5 18 19 10 14 6 8 34 
5 2 53 1 1 1 26 16 8 14 10 9 28 13 12 18 10 П 41 
5 2 65 1 1 2 15 16 5 24 12 12 22 15 12 26 11 19 44 
5 1 46 1 1 2 15 5 5 4 9 4 15 13 10 10 8 11 41 
5 1 49 1 1 2 19 12 12 16 13 15 28 16 14 36 14 16 59 
5 1 55 1 1 2 21 40 8 36 9 10 27 49 13 47 9 17 53 
5 2 46 2 1 10 20 9 6 17 10 П 29 13 12 23 12 8 31 
5 2 58 4 1 1 30 55 19 52 15 23 31 54 19 54 15 23 78 


(Continued) 
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SESAME STREET DATA (Continued) 
P P P P P P 

V S у P P PR ROP О О РО 9 P 
I E I R P R R E E 5 О S S О S E 
E T E E R E E R C T S T T S T A 
S W T W B E FN EL B T FN T C B 
I S А С I EOL О UL AOL OU к LO 
I T E С А М ЭОМ D E R MA S D E К M E A D 
D E X E T с C Y T M B ТЕ Y T M BLS Y 
284 5 1 47 4 1 1 18 13 9 20 8 11 28 34 17 33 11 18 43 
285 5 1 53 4 1 1 26 25 14 36 13 13 30 44 19 43 15 23 90 
286 5 2 51 2 1 1 30 15 8 12 10 10 30 33 12 45 12 20 49 
237 5 1 49 4 1 1 17 16 12 15 8 15 25 26 15 20 12 11 4l 
288 5 1 43 2 1 1 1613 6 1 8 9 22 19 10 10 9 7 30 
239 5 2 60 3 1 1 23 16 9 33 14 16 29 35 18 50 13 23 69 
240 5 1 51 4 1 1 21 11 10 27 10 12 25 32 17 47 11 19 65 
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DESCRIPTION OF HEADACHE DATA SET 


This study investigated the effectiveness of different kinds of psychological treat- 
ment on the sensitivity of headache sufferers to noise. Each subject was exposed to 
the following sequence of operations: (1) measurement of initial sensitivity scores, 
(2) relaxation training, (3) treatment, and (4) measurement of final sensitivity 
scores. 

The sensitivity scores were obtained by having subjects listen to a tone that 
gradually increased in volume and asking them to rate the levels at which the tone 
became (1) uncomfortable and (2) definitely unpleasant. These levels, denoted by 
U and DU, are the dependent variables, with pretest scores on these variables use- 
ful for possible covariance analysis. 

Relaxation training was applied to all subjects and comprised two stages: (1) 
The subjects were asked to listen to the tone at their definitely unpleasant level for 
up to two minutes (with the option to terminate the exposure if they chose). (2) The 
subjects were then given instruction on breathing techniques and the use of visual 
imagery to act as a controlled distraction. 

There were two types of headache sufferers in the study: (a) migrane and (b) 
tension. Within each of these groups subjects were randomly assigned to one of the 
following four treatment groups: 


Tl— subjects in this group listened to the tone again at their definitely un- 
pleasant (DU) level for the length of time that they were able to stand 
it in (a) above. 

T2— as Т1 but with one extra minute's exposure to the tone. 

T3— as T2 but having been instructed to use the relaxaton techniques of 
breathing and imagery. 

T4— this was a control group, in that the subjects experienced no exposure 
to the tone between (a) in the relaxation training and the final sensi- 
tivity measures. 


Some missing data reduced an intended balanced design to the following 2 x 4 
factorial design, with cell sizes indicated: 


Tl T2 T3 T4 
MIGRANE 11 11 12 11 
TENSION 14 11 16 12 


382 APPENDIX A 
HEADACHE DATA 

H H 
E E 
A A 
D D 
A P P A P P 
C T R R С T R R 
H R E E H R E E 
E E U D U D E E U D U D 

A N F N F A N F N F 
T Т С U C U T T. С U С U 
Y О N О М Ү О М О М 
Р G M Р M Р Р а M Р M Р 
Е Р Е L Е L E Р Е Т; Е L 
1 3 234 530 5.80 8.52 1 1 273 685 4.68 6.68 
2 1 0.37 0.53 055 0.84 І 3 750 912 5.70 7.88 
1 3 463 721 5.63 6.75 1 3 360 730 483 7.32 
1 2 245 3.75 250 3.18 1 1 2.31 3.25 2.00 3.30 
1 1 138 2.33 223 3.98 2 3 085 142 137 1.89 
1 3 185 3.25 3.40 480 1 2 190 868 225 670 
2 1 6.00 9.90 825 107 1 2 156 2.92 200 2.84 
2 1 2.95 4.98 3.85 4.75 2 1 205 345 175 230 
2 2 668 990 852 128 1 4 172 275 2.20 3.95 
2 з 3.90 650 327 7.80 1 3 040 090 140 230 
1 1 2.19 260 2.50 3.50 2 4 140 182 2.10 3.90 
2 3 3.22 565 2.70 480 2 з 3.50 660 465 8.00 
2 2 315 525 530 7.60 1 4 196 3.18 1.20 3.15 
2 2 255 405 400 5.45 2 1 185 3.30 180 3.15 
2 4 185 320 142 2.62 2 1 150 175 135 3.40 
2 4 432 615 4.98 6.45 2 1 243 7.95 4.08 6.83 
1 2 342 5.59 450 7418 2 з 3.70 5488 313 4.00 
2 3 257 440 327 8464 1 3 162 340 403 570 
2 2 46 682 3.45 6.24 1 2 112 139 106 178 
2 4 410 765 336 6.58 1 4 265 488 120 3.50 
1 4 066 100 043 0.60 2 4 208 330 244 347 
2 2 338 827 707 0.90 2 4 386 594 320 48 
2 2 239 460 293 5.42 2 з 362 883 512 7.71 
1 4 186 4.06 178 244 2 3 219 3.94 231 348 
2 4 208 264 171 299 2 4 151 280 124 2.63 
1 2 1.60 2.83 187 3.00 1 3 0.75 0.94 0.88 145 
1 1 0.87 1.16 0.59 0.95 1 3 192 244 200 2.54 
2 4 142 647 200 348 1 4 182 2.57 0.64 107 
2 1 186 2.74 0.89 141 1 4 124 269 095 176 
1 3 090 141 156 21 2 3 124 323 1.36 2.86 
1 4 151 2.79 083 1464 1 1 1.6 869 235 751 
2 3 144 306 111 2.58 2 2 046 112 093 1.36 
1 2 220 525 104 319 2 2 129 232 156 530 
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H H 
E E 
A A 
D D 
A Р Р А Р Р 
e T R R C T R R 
H R E E H R E E 
E E U D U D E E U D U D 

A N F N F A N F N F 
Т т. С U C U T т. С U С U 
Y О М О М Ү О М О М 
Р G M Р M Р Р G M Р M Р 
Е Р F L F E E P F L F L 
2 2 234 425 216 4.10 1 1 086 155 0.88 2.14 
1 2 5.91 856 2.62 6.08 2 2 143 3.94 3.61 757 
2 3 151 163 155 164 2 1 055 110 1.80 3.92 
2 3 742 145 815 133 1 4 340 510 2.80 4.40 
2 3 152 235 120 2.55 2 1 185 5.68 775 161 
1 1 225 440 1.56 4.93 1 2 422 133 122 141 
2 3 330 455 525 5.83 2 3 430 113 8.78 12.38 
1 3 358 5.60 6.94 9.16 2 3 6417 15.5 7.54 16.24 
2 1 043 064 022 0.39 1 1 133 103 167 3.79 
2 1 0.87 130 110 145 1 2 505 1005 3.02 10.1 
2 1 188 419 179 426 1 1 8.94 1546 3.64 9.00 
1 1 2.50 464 223 3.60 2 1 133 1700 11.87 16.6 
2 4 325 509 124 21 1 4 10.36 170 11.50 15.56 
1 2 1.08 149 1.09 1.82 2 4 3.01 1235 401 7.51 
1 4 071 143 058 0.86 2 2 2.00 837 10.08 5.49 
1 3 1272 165 152 168 2 4 113 168 6.75 16.87 
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DESCRIPTION OF THE CARTOON DATA SET 


This is a data set on 179 subjects from the Minitab Handbook (2nd ed., 1985), and 
is used with permission of the publisher. A short instructional slide presentation 
was developed, which dealt with the behavior of people in a group situation, and in 
particular the various roles or character types that group members often assume. 
The presentation consisted of a 5 minute lecture on tape, accompanied by 18 
slides. Each role was identified by an animal. Each animal was shown on two 
slides: once in a cartoon sketch and once in a realistic picture. All 179 subjects saw 
all 18 slides, but a randomly selected half of them saw the slides in black and white 
while the other half saw the slides in color. 

After seeing the slides, the subjects took a test on the material. The slides were 
presented in random order, and the subjects wrote down the character type repre- 
sented by that slide. They received two scores: one for the number of cartoon char- 
acters correctly identified and one for the number of realistic characters correctly 
identified. Each score could range from 0 to 9, since there were 9 characters. Four 
weeks later the subjects were retested. Some subjects did not show up for the retest 
and that is indicated by a blank. 

There are three groups of subjects in this study: (1) preprofessional personnel at 
three hospitals in Pennsylvania involved in an in-service training program, (2) pro- 
fessional personnel involved in the same training program, and (3) a group of Penn 
State undergraduate students. All these subjects were given the Otis Mental Ability 
Test, which yields a rough estimate of their natural ability. 

The order in which the variables are arranged on the file is as follows: 


Variable No. Variable Name Description 

1 ID Identification number 

2 COLOR 0 = black and white, 1 = color (no participant saw both) 

3 Ер Education: 0 = preprofessional, 1 = professional, 2 = college 
student 

4 LOCATION Location: 1 = hospital A, 2 = hospital В, 3 = hospital С, 4 = Penn 
State student 

5 OTIS OTIS score: from about 70 to about 130 

6 CARTOON 1 Score on cartoon test given immediately after presentation 
(possible scores аге 0,1,2,. . .,9) 

7 REALI Score on realistic test given immediately after presentation 
(possible scores аге 0,1,2,.. .,9) 

8 CARTOON2 Score on cartoon test given four weeks (delayed) after 


presentation (possible scores are 0,1,2,. . .,9; a blank is used for a 
missing observation) 

9 REAL2 Score on realistic test given four weeks (delayed) after 
presentation (possible scores are 0,1,2,. . .,9; a blank is used for a 
missing observation) 
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CARTOON DATA 


U < ~ 


T R T R 
O O E O E 


T R T R 
O O E O E 


T O A O A 


I 


T O A O A 


N L N L 


O оо 
D- R- Cy С 
40 


41 


I 


D R с C 


9 


108 9 


4 


107 4 


en 


96 


42 


со 


101 


44 
45 


46 


14 8 


са 


87 


47 


1 


100 2 


48 


49 


115 8 


10 


11 


2 
2 
8 


102 6 


50 
ЭТ 


105 2 


12 
13 


115 7 
88 


52 


en 


94 


en 


14 
15 


54 
55 
56 
57 
58 
59 


104 9 


6 


104 5 


16 
17 
18 


5 


104 5 


19 
20 
21 


123 8 


1 
1 
1 


106 9 


60 
61 


са 


125 9 


22 


62 


63 


en 


24 
25 


2 129 9 


1 


0 


64 
65 


2 


103 6 


en 


26 
27 


со 


66 


106 2 


e 


67 


28 


116 9 


1 
1 
1 
1 
1 


68 


103 6 


29 
30 
31 


106 8 


69 


5 


107 8 


2 


0 


70 
71 


8 


107 8 


100 7 


32 
33 
34 
35 


124 8 


72 


2 


107 3 


0 1 3 124 9 


74 
75 


en 


80 


36 
37 


76 
TI 
78 


118 6 


1 
1 


102 8 


38 


4 


102 6 


39 


(Continued) 


386 


APPENDIX A 


CARTOON DATA (Continued) 


С С С С 
А А А А 
R R R R 
С T R Т К C T R T R 
O Е о о E О E О E ОО E O E 
L D L T О A О A L D L Т О A О А 
гой O I N L N L I O U O I N L N L 
D- R-E CS- 1.2. 2 DR © C-s- 1 p: 2-- 2 
7 0 1 3 95 7 4 118 0 2 4 97 8 8 6 4 
80 0 1 3 90 4 3 119 0 2 4 123 9 9 7 4 
81 0 1 3 86 1 0 1200 0 2 4 113 8 7 6 6 
82 0 1 3 104 6 4 121 0 2 4 110 8 7 3 5 
83 | 1 | 11 9 9 6 3 122 0 2 4 119 8 7 6 6 
84 | 1 1 105 1 0 123 0 2 4 116 5 7 
85 1 1 1 110 1 0 0 0 1244 0 2 4 113 8 6 5 5 
86 1 1 1 8 0 0 0 0 125 0 2 4 128 9 9 
87 1 1 1 78 4 1 1 1 126 0 2 4 113 8 5 4 2 
88 1 1 2 1209 9 127 0 2 4 110 5 7 
89 | 1 2 10 9 6 6 5 128 0 2 4 114 7 6 5 5 
90 | 1 2 107 8 6 129 0 2 4 132 9 8 4 6 
91 1 1 2 125 7 8 130 0 2 4 110 7 8 2 5 
92 1 1 2 117 9 9 131 0 2 4 122 т 7 4 2 
93. d 1 2 126 8 8 5 5 132 0 2 4 123 9 9 6 7 
94 1 1 2 98 4 5 133 0 2 4 131 9 9 7 7 
95 | 1 2 HI 8 6 134 0 2 4 BI 9 9 8 8 
96 | 1 2 10 8 7 135 0 2 4 1219 8 7 8 
97 1 1 2 120 9 7 136 0 2 4 125 9 8 
98 1 1 2 114 8 7 6 4 137 0 2 4 101 6 б 4 
99 1 1 2 117 6 7 138 0 2 4 120 8 9 6 7 
100 1 1 3 105 7 6 139 0 2 4 99 9 6 
101 1 1 3 97 6 6 1440 0 2 4 #128 8 9 8 7 
102 1 1 3 86 1 1 141 0 2 4 129 8 6 5 2 
103 1 rt 3 14.7 5 142 0 2 4 125 8 6 7 4 
104 1 1 3 93 1 0 133 0 2 4 107 8 8 8 5 
105 1 I > 115 8$ 7 144 0 2 4 102 8 7 6 4 
106 1 1 3 102 2 3 5 2 145 0 2 4 125 9 8 
107 1 1 3 п1 7 3 4 4 146 1 2 4 129 8 8 
108 1 1 3 82 1 1 1447 1 2 4 122 3 0 2 3 
109 1 1 3 117 8 5 42 3 148 1 2 4 124 7 6 6 7 
110 0 2 4 132 9 9 1449 1 2 4 115 8 8 
111 O 2 4 13 7 8 150 1 2 4 17 8 6 5 2 
112 0 2 4 1309 7 1 4 151 1 2 4 132 7 6 5 7 
13 0 2 4 12 9 9 6 4 152 1 2 4 109 8 5 5 5 
114 0 2 4 10 7 5 3 0 153 1 2 4 1079 5 9 2 
15 0 2 4 103 7 5 3 0 154 1 2 4 116 8 7 6 5 
116 0 2 4 118 9 9 155 1 2 4 18 8 5 6 5 
117 0 2 4 119 9 9 7 8 156 1 2 4 124 9 9 6 7 
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C C C C 

A A A A 

R R R R 
C T R T R С T R T R 
O E О О E О E O E О O E O E 
L D L T O A OA L D L T O A О А 
I OU о IN LNL I O U о I N LN L 
D.R- C C S t 1 2 2 D В C C S 1 1 2 2 
157 1 2 4 102 9 5 5 2 160 1 2 4 133 8 8 7 7 
158 1 2 4 109 7 7 7 1701 2 4 156 4 5 3 
159 1 2 4 119 7 5 2 4 171 1 2 4 1247 7 9 8 
1601 2 4 99 3 2 4 0 121 2 4 11229 9 9 8 
161 1 2 4 1022 7 8 5 6 173 1 2 4 1289 9 9 9 
162 1 2 4 115 7 7 174 1 2 4 96 8 8 7 6 
163 1 2 4 105 8 6 3 0 175 1 2 4 110 8 8 4 5 
164 1 2 4 104 7 6 176 1 2 4 108 8 8 6 8 
165 1 2 4 12 7 7 177 1 2 4 12557 6 8 8 
1661 2 4 117 9 9 6 5 178 1 2 4 111 4 3 4 1 
167 1 2 4 108 9 9 1799 1 2 4 103 4 3 2 1 

168 1 2 4 135 8 8 8 8 
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DESCRIPTION OF ATTITUDE DATA SET 


Data was collected on 189 third through sixth graders from a suburban mid-west- 
ern public school. The children were measured on preference toward the following 
subjects: mathematics, language arts, science, reading, and social studies. An in- 
tervention was then employed with the teachers (five) to change these preferences 
(attitudes) in a positive way, and then the children were measured again on subject 
preference four months later. 

The intervention consisted of a three hour lecture and discussion (in-service 
work) by a professor in September with the teachers on what shapes attitudes and 
how they could go about changing the attitudes of their students. There was a par- 
ticular emphasis in this school on changing the mathematics attitude. The profes- 
sor met again with the teachers in December to discuss whether they had imple- 
mented some of the changes he had suggested. 
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ATTITUDE DATA 
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ATTITUDE DATA (Continued) 
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ATTITUDE DATA (Continued) 
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DESCRIPTION OF NATIONAL ACADEMY 
OF SCIENCES DATA 


The following data is from a 1982 National Academy of Sciences published report 
rating the “scholary quality" of research programs in the humanities, physical sci- 
ences and social sciences. The ratings were based on the rankings of quality and 
reputation made by senior faculty in the field who taught at institutions other than 
the one being rated. 

The data to be presented are the quality ratings of 46 research doctorate pro- 
grams in psychology, as well as six potential correlates of the quality ratings. Here 
is a description of the variables: QUALITY Mean rating of scholarly quality of 
program faculty NFACULTY Number of falculty members in program as of De- 
cember 1980 NGRADS Number of program graduates from 1975 through 1980 
PCTSUPP Percentage of program graduates from 1975-1979 that received fellow- 
ships or training grant support during their graduate education PCTGRANT Per- 
cent of faculty members holding research grants from the Alcohol, Drug Abuse 
and Mental Health Administration, the National Institute of Health or the National 
Science Foundation at any time during 1978-1980 NARTICLE Number of pub- 
lished articles attributed to program faculty members 1978-1980 PCTPUB Рег- 
cent of faculty with one or more published articles from 1978-1980. 
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N Р N 
Q F Р С А 
U A N С T R Р 
А C G T G T C 
L U R S R I T 
(0) І L A U A C P 
B T T D P N L U 
S NAME Y Y S P T E B 
1 ADELPHI 12 13 19 16 8 14 39 
2 ARIZONA-TUSCON 23 29 72 67 3 61 66 
3 BOSTON UNIV 29 38 111 66 13 68 68 
4 BROWN 36 16 28 52 63 49 75 
5 UCBERKELEY 44 40 104 64 53 130 83 
6 UCRIVERSIDE 21 14 28 59 29 65 79 
7 | CARNEGIE MELLON 40 44 16 81 35 79 82 
8 UNIV OF CHICAGO 42 60 57 65 40 187 82 
9 CLARK UNIV 24 16 18 87 19 32 75 
10 COLUMBIA TEACHERS 30 37 41 43 8 50 54 
11 DELAWARE, UNIV ОЕ 20 20 45 26 25 49 50 
12 DETROIT, UNIV OF 8 11 21 7 0 9 21 
13 FLORIDA ST-TALAH 28 29 112 64 35 65 69 
14 FULLER THEOL SEMIN 14 14 57 10 0 11 43 
15 ОМУ OF GEORGIA 27 38 167 28 13 196 84 
16 HARVARD 46 27 113 62 52 173 85 
17 HOUSTON, UNIV OF 29 32 122 51 119 79 69 
18 ОМУ ILLINOIS-CHAMP 42 56 116 56 32 208 73 
19 IOWA, UNIV OF 33 32 54 49 10 120 69 
20 KANSAS, UNIV OF 31 42 79 41 14 14 71 
21 KENT STATE UNIV 23 30 76 22 20 87 67 
22 LOUISIANA STATE 18 18 62 39 6 10 39 
23 UNIV OF MARYLAND 29 41 98 41 12 101 66 
24 MIAMI UNIV 21 23 52 33 4 59 78 
25 UMICH-ANN ARB 45 111 222 64 32 274 70 
26 U MISSOURI 25 26 63 39 23 160 89 
27 U NEW HAMPSHIRE 18 16 24 4 31 39 63 
28 NEW YORK UNIV 33 38 154 55 34 84 63 
29 UNC—GREENSBORO 21 19 40 7 5 60 84 
30 NORTHEASTERN 24 16 18 25 63 31 63 
31 NOTRE РАМЕ 15 13 29 23 15 62 85 
32 OKLA ST-STILLWATER 15 23 41 51 4 24 57 
33 РЕММ 5ТАТЕ 36 32 69 65 16 122 75 
34 PRINCETON 38 21 38 28 48 92 91 
35 UNIV OF ROCHESTER 32 28 90 70 36 117 61 
36 SUNY ALBANY 21 22 52 10 21 114 86 
37 ST LOUIS UNIVERSITY 16 20 80 46 10 19 40 
38 UNIV SOUTH FLORIDA 26 32 41 13 6 64 56 
39 STANFORD 48 26 81 70 58 155 100 
40 TEMPLE 26 40 81 42 10 70 68 


(Continued) 


394 APPENDIX A 


N P N 

Q F P C A 

U A N C T R P 

A C G T G T C 

L U R S R I T 
О I L A U A с Р 
В T T D Р N È U 
5 МАМЕ Y Y S P T E B 
41 TEXAS TECH LUBBOCK 14 19 87 15 5 72 79 
42 UNIV OF TOLEDO 12 17 26 9 6 15 59 
43 UNIV OF UTAH, SALT L 29 29 71 74 17 85 76 
44 VIRGINIA POLYTECH 34 27 20 0 29 79 57 
45 WASHINGTON UNIV-ST.L 28 26 70 68 27 84 73 
46 UNIV WISC—MADISON 39 36 59 57 67 172 83 


Jones, L. V., Lindzey, G., & Coggeshall, Р. (Eds.) (1982). An Assessment of Research-Doctorate 
Programs in the United States: Social and Behavioral Sciences. (Washington, DC: National Academy 
Press). 
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price size nobed nobath new 

1 48.50 1.10 3.00 1.00 .00 

2 55.00 1.01 3.00 2.00 .00 

3 68.00 1.45 3.00 2.00 .00 

4 137.00 2.40 3.00 3.00 .00 

5 309.40 3.30 4.00 3.00 1.00 

6 17.50 40 1.00 1.00 .00 

7 19.60 1.28 3.00 1.00 .00 

8 24.50 74 3.00 1.00 .00 

9 34.80 .78 2.00 1.00 .00 
10 32.00 ‚97 3.00 1.00 .00 
11 28.00 84 3.00 1.00 .00 
12 49.90 1.08 2.00 2.00 .00 
13 59.90 .99 2.00 1.00 .00 
14 61.50 1.01 3.00 2.00 .00 
15 60.00 1.34 3.00 2.00 .00 
16 65.90 1.22 3.00 1.00 .00 
17 67.90 1.28 3.00 2.00 .00 
18 68.90 1.29 3.00 2.00 .00 
19 69.90 1:52 3.00 2.00 .00 
20 70.50 1.25 3.00 2.00 .00 
21 72.90 1.28 3.00 2.00 .00 
22 72.50 1.28 3.00 1.00 .00 
23 72.00 1.36 3.00 2.00 .00 
24 71.00 1.20 3.00 2.00 .00 
25 76.00 1.46 3.00 2.00 .00 
26 72.90 1.56 4.00 2.00 .00 
27 73.00 1.22 3.00 2.00 .00 
28 770.00 1.40 2.00 2.00 .00 
29 76.00 1.15 2.00 2.00 .00 
30 69.00 1.74 3.00 2.00 .00 
31 75.50 1.62 3.00 2.00 .00 
32 76.00 1.66 3.00 2.00 .00 
33 81.80 1.33 3.00 2.00 .00 
34 84.50 1.34 3.00 2.00 .00 
35 83.50 1.40 3.00 2.00 .00 
36 86.00 1.15 2.00 2.00 1.00 
37 86.90 1.58 3.00 2.00 1.00 
38 86.90 1.58 3.00 2.00 1.00 
39 86.90 1.58 3.00 2.00 1.00 
40 87.90 1.71 3.00 2.00 .00 
41 88.10 2.10 3.00 2.00 .00 
42 85.90 127 3.00 2.00 .00 
43 89.50 1.34 3.00 2.00 .00 
44 87.40 1.25 3.00 2.00 .00 
45 87.90 1.68 3.00 2.00 .00 


(Continued) 


396 APPENDIX A 


AGRESTI HOME SALES DATA (Continued) 


price size nobed nobath new 
46 88.00 1.55 3.00 2.00 .00 
47 90.00 1:55 3.00 2.00 .00 
48 96.00 1.36 3.00 2.00 1.00 
49 99.90 1.51 3.00 2.00 1.00 
50 95.50 1.54 3.00 2.00 1.00 
51 98.50 1.51 3.00 2.00 .00 
52 100.10 1.85 3.00 2.00 .00 
53 99.90 1.62 4.00 2.00 1.00 
54 101.90 1.40 3.00 2.00 1.00 
55 101.90 1.92 4.00 2.00 .00 
56 102.30 1.42 3.00 2.00 1.00 
57 110.80 1.56 3.00 2.00 1.00 
58 105.00 1.43 3.00 2.00 1.00 
59 97.90 2.00 3.00 2.00 .00 
60 106.30 1.45 3.00 2.00 1.00 
61 106.50 1.65 3.00 2.00 .00 
62 116.00 1.72 4.00 2.00 1.00 
63 108.00 1.79 4.00 2.00 1.00 
64 107.50 1.85 3.00 2.00 .00 
65 109.90 2.06 4.00 2.00 1.00 
66 110.00 1.76 4.00 2.00 .00 
67 120.00 1.62 3.00 2.00 1.00 
68 115.00 1.80 4.00 2.00 1.00 
69 113.40 1.98 3.00 2.00 .00 
70 114.90 1.57 3.00 2.00 .00 
71 115.00 2.19 3.00 2.00 .00 
72 115.00 2.07 4.00 2.00 .00 
73 117.90 1.99 4.00 2.00 .00 
74 110.00 1.55 3.00 2.00 .00 
75 115.00 1.67 3.00 2.00 .00 
76 124.00 2.40 4.00 2.00 .00 
TI 129.90 1.79 4.00 2.00 1.00 
78 124.00 1.89 3.00 2.00 .00 
79 128.00 1.88 3.00 2.00 1.00 
80 132.40 2.00 4.00 2.00 1.00 
81 139.30 2.05 4.00 2.00 1.00 
82 139.30 2.00 4.00 2.00 1.00 
83 139.70 2.03 3.00 2.00 1.00 
84 142.00 2.12 3.00 3.00 .00 
85 141.30 2.08 4.00 2.00 1.00 
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AGRESTI HOME SALES DATA (Continued) 
price size nobed nobath new 
86 147.50 2.19 4.00 2.00 00 
87 142.50 2.40 4.00 2.00 00 
88 148.00 2.40 5.00 2.00 00 
89 149.00 3.05 4.00 2.00 00 
90 150.00 2.04 3.00 3.00 00 
91 172.90 2.25 4.00 2.00 1.00 
92 190.00 2.97 4.00 3.00 1.00 
93 280.00 3.85 4.00 3.00 00 
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TABLE B.1 
Critical Values for F 


df for Numerator 


df error а 1 2 3 4 5 6 8 12 
1 01 4052 4999 5403 5625 5764 5859 5981 6106 
05 161.45 199,50 215.71 224.58 230.16 233.99 238.88 243.91 
10 39.85 49.50 53.59 55.83 57.24 58.20 59.44 60.70 
20 9.47 12.00 13.06 13.73 14.01 14.26 14,59 14.90 
2 01 98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.42 
05 18.51 19.00 19.16 19,25 19.30 19,33 19.37 19.41 
10 8.53 9.00 9.16 9,24 9.29 9.33 9.37 9.41 
20 3.56 4.00 4.16 4.24 4.28 4.32 4.36 4.40 
3 001 167.5 148.5 141.1 137.1 134.6 132.8 130.6 128.3 
01 34.12 30.81 29.46 28.71 28.24 27.91 27.49 27.05 
05 10.13 9,55 9.28 9.12 9.01 8.94 8.84 8.74 
10 5.54 5.46 5.39 5.34 5.31 5.28 5.25 5.22 
20 2.68 2.89 2.94 2.96 2.97 2.97 2.98 2.98 
4 001 74.14 61.25 56.18 53.44 51.71 50.53 49.00 47.41 
01 21.20 18.00 16.69 15.98 15.52 15.21 14.80 14,37 
05 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 
10 4.54 4.32 4.19 4.11 4.05 4.01 3.95 3.90 
20 2.35 2.47 2.48 2.48 2.48 2.47 2.47 2.46 
5 001 47.04 36.61 33.20 31.09 29.75 28.84 27.64 26.42 
01 16.26 13.27 12.06 11.39 10.97 10.67 10.29 9,89 
05 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 
10 4.06 3.78 3.62 3:52 3.45 3.40 3.34 327 
20 2.18 2.26 2:25 2.24 2.23 2.22 2.20 2.18 
6 001 35.51 27.00 23.70 21.90 20.81 20.03 19.03 17.99 
01 13.74 10.92 9,78 9.15 8.75 8.47 8.10 TA 
.05 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 
.10 3.78 3.46 3.29 3.18 3.11 3.05 2.98 2.90 
20 2.07 2.13 2.11 2.09 2.08 2.06 2.04 2.02 
7 001 29.22 21.69 18.77 17.19 16.21 15.52 14.63 13.71 
01 12.25 9,55 8.45 7.85 7.46 7.19 6.84 6.47 
05 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.57 
.10 3.59 3.26 3.07 2.96 2.88 2.83 2.75 2.67 
20 2.00 2.04 2.02 1.99 1.97 1.96 1.93 1.91 
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TABLE B.1 (Continued) 


df for Numerator 
df error а, 1 2 3 4 э] б 8 12 

8 001 25.42 18.49 15.83 14.39 13.49 12.86 12.04 11.19 
01 11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.67 

.05 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 

10 3.46 3A 2.92 2.81 2.73 2.67 2.59 2.50 

20 1.95 1.98 1.95 1.92 1.90 1.88 1.86 1.83 

9 001 22.86 16.39 13.90 12.56 11.71 11.13 10.37 9.57 
01 10.56 8.02 6.99 6.42 6.06 5.80 5.47 2411 

05 5412 4.26 3.86 3.63 3.48 3.37 3.23 3.07 

10 3.36 3.01 2.81 2.69 2.61 2:55 2.47 2.38 

20 1.91 1.94 1.90 1.87 1.85 1.83 1.80 1.76 

10 001 21.04 14.91 12.55 11.28 10.48 9,92 9.20 8.45 
01 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 

05 4.96 4.10 371 3.48 3.33 3.22 3.07 2.91 

10 3.28 2.92 2.73 2.61 2.52 2.46 2.38 2.28 

20 1.88 1.90 1.86 1.83 1.80 1.78 1,75 1:72 

11 001 19.69 13.81 11.56 10.35 9.58 9.05 8.35 7.63 
01 9.65 7.20 6.22 5.67 5.32 5.07 4.74 4.40 

‚05 4.84 3.98 3.59 3.36 3.20 3.09 2.95 2.79 

10 3.23 2.86 2.66 2.54 2.45 2.39 2.30 2.21 

20 1.86 1.87 1.83 1.80 127 1.75 1:72 1.68 

12 .001 18.64 12.97 10.80 9.63 8.89 8.38 7.71 7.00 
01 9.33 6.93 5.95 5.41 5.06 4.82 4.50 4.16 

05 4.75 3.88 3.49 3.26 3:11 3.00 2.85 2.69 

10 3.18 2.81 2.61 2.48 2.39 2:33 2.24 2.15 

20 1.84 1.85 1.80 1.77 1.74 1.72 1.69 1.65 

13 001 17.81 12.31 10.21 9.07 8.35 7.86 7.21 6.52 
01 9.07 6.70 5.74 5.20 4.86 4.62 4.30 3.96 

05 4.67 3.80 3.41 3.18 3.02 2.92 2.77 2.60 

10 3.14 2.76 2.56 2.43 2.35 2.28 2.20 2.10 

20 1.82 1.83 1.78 1.75 1.72 1.69 1.66 1.62 

14 001 17.14 11.78 9.73 8.62 7.92 7.43 6.80 6.13 
.01 8.86 6.51 5.56 5.03 4.69 4.46 4.14 3.80 

05 4.60 3.74 3.34 3.11 2.96 2.85 2.70 2,53 

10 3.10 2:73 2.52 2.39 2.31 2.24 2.15 2.05 

20 1.81 1.81 1.76 1.73 1.70 1.67 1.64 1.60 


(Continued) 


402 APPENDIX B 


TABLE В.1 (Continued) 


df for Numerator 


df error о 1 2 3 4 5 6 8 12 
15 001 16.59 11.34 9.34 8.25 7.57 7.09 6.47 5.81 
01 8.68 6.36 5.42 4.89 4.56 4.32 4.00 3.67 
05 4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 
10 3.07 2.70 2.49 2.36 2.27 221 2.12 2.02 
20 1.80 1.79 175 171 1.68 1.66 1.62 1.58 
16 001 16.12 10.97 9.00 7.94 7.27 6.81 6.19 5.55 
01 8.53 6.23 5.29 4.77 4.44 4.20 3.89 353 
05 4.49 3.63 3.24 3.01 2.85 2.74 2.59 2.42 
10 3.05 2.67 2.46 2.33 2.24 2.18 2.09 1.99 
20 1.79 1.78 1.74 1.70 1.67 1.64 1.61 1.56 
17 001 15.72 10.66 8.73 7.68 7.02 6.56 5.96 5.32 
01 8.40 6.11 5.18 4.67 4.34 4.10 3.79 3.45 
05 4.45 3.59 3.20 2.96 2.81 2.70 2:55 2.38 
.10 3.03 2.64 2.44 2.31 2.22 2:15 2.06 1.96 
20 1.78 1.77 172 1.68 1.65 1.63 1.59 1.55 
18 .001 15.38 10.39 8.49 7.46 6.81 6.35 5.76 5.13 
:01 8.28 6.01 5.09 4.58 4.25 4.01 3:71 3.37 
05 4.41 3.55 3.16 2.93 2.77 2.66 2.51 2.34 
10 3.01 2.62 2.42 2.29 2.20 2.13 2.04 1.93 
20 1.77 1.76 1.71 1.67 1.64 1.62 1.58 1:53 
19 .001 15.08 10.16 8.28 7.26 6.61 6.18 5.59 4.97 
:01 8.18 5.93 5.01 4.50 4.17 3.94 3.63 3.30 
05 4.38 3.52 3.13 2.90 2.74 2.63 2.48 2:21 
10 2.99 2.61 2.40 2:27 2.18 2.11 2.02 1.91 
20 1.76 1.75 1.70 1.66 1.63 1.61 1/97 1.52 
20 001 14.82 9.95 8.10 7.10 6.46 6.02 5.44 4.82 
01 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.23 
05 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 
10 2.97 2.59 2.38 2:25 2.16 2.09 2.00 1.89 
20 1.76 1.75 1.70 1.65 1.62 1.60 1.56 1.51 
21 001 14.59 9.77 7.94 6.95 6.32 5.88 5.31 4.70 
01 8.02 5.78 4.87 4.37 4.04 3.81 3:51 3.17 
05 4.32 3.47 3.07 2.84 2.68 2.57 2.42 2:25 
.10 2.96 2:57 2.36 2:23 2.14 2.08 1.98 1.88 
20 1:75 1.74 1.69 1.65 1.61 1.59 1:55 1.50 
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TABLE B.1 (Continued) 
df for Numerator 
df error а. 1 2 2 4 Э, б 8 12 
22 001 14.38 9.61 7.80 6.81 6.19 5.76 5.19 4.58 
01 7.94 5.72 4.82 4.31 3.99 3.76 3.45 3.12 
05 4.30 3.44 3.05 2.82 2.66 2:55 2.40 2:23 
10 2.95 2.56 2.35 2.22 2.13 2.06 1.97 1.86 
.20 1:75 1:73 1.68 1.64 1.61 1.58 1.54 1.49 
23 001 14.19 9.47 7.67 6.69 6.08 5.65 5.09 4.48 
01 7.88 5.66 4.76 4.26 3.94 3.71 3.41 3.07 
05 4.28 3.42 3.03 2.80 2.64 2.53 2.38 2.20 
10 2.94 2:55 2.34 2.21 2.11 2.05 1.95 1.84 
.20 1.74 1.73 1.68 1.63 1.60 1:57 1.53 1.49 
24 001 14.03 9.34 7.55 6.59 5.98 9:55 4.99 4.39 
01 7.82 5.61 4.72 4.22 3.90 3.67 3.36 3.03 
05 4.26 3.40 3.01 2.78 2.62 2.51 2.36 2.18 
10 2.93 2.54 2.33 2.19 2.10 2.04 1.94 1.83 
.20 1.74 1.72 1.67 1.63 1:59 1:57 1.53 1.48 
25 001 13.88 9.22 7.45 6.49 5.88 5.46 4.91 4.31 
01 7.77 9:37 4.68 4.18 3.86 3.63 3.32 2.99 
05 4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 
10 2.92 2:53 2:32 2,18 2.09 2.02 1.93 1.82 
.20 1.73 1.72 1.66 1.62 1.59 1.56 1:52 1.47 
26 001 13.74 9.12 7.36 6.41 5.80 5.38 4.83 4.24 
01 7,22 5:53 4.64 4.14 3.82 3.59 3.29 2.96 
05 4.22 3.37 2.98 2.74 2.59 2.47 2.32 2-15 
.10 2.91 2.52 2.31 2.17 2.08 2.01 1.92 1.81 
.20 1.73 1:71 1.66 1.62 1.58 1.56 1.52 1.47 
27 001 13.61 9.02 7.27 6.33 5.73 5.31 4.76 4.17 
01 7.68 5.49 4.60 4.11 3.78 3.56 3.26 2.93 
05 4.21 3.35 2.96 2.73 2.57 2.46 2.30 2.13 
10 2.90 2:51 2.30 2:17 2.07 2.00 1.91 1.80 
.20 1.73 1.71 1.66 1.61 1.58 1.55 1.51 1.46 
28 001 13.50 8.93 7.19 6.25 5.66 5.24 4.69 4.11 
01 7.64 5.45 4.57 4.07 3.75 3.53 3.23 2.90 
05 4.20 3.34 2.95 2.71 2.56 2.44 2.29 2.12 
10 2.89 2.50 2.29 2.16 2.06 2.00 1.90 1.79 
.20 1.72 1.71 1.65 1.61 1:57 1:55 1.51 1.46 
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TABLE В.1 (Continued) 


df for Numerator 


df error о 1 2 3 4 3 6 8 12 

29 001 13.39 8.85 1:12 6.19 5.59 5.18 4.64 4.05 
01 7.60 5.42 4.54 4.04 3.73 3.50 3.20 2.87 

05 4.18 3.33 2.93 2.70 2.54 2.43 2.28 2.10 

10 2.89 2.50 2.28 2.15 2.06 1.99 1.89 1.78 

20 1.72 1.70 1.65 1.60 1.57 1.54 1.50 1.45 

30 001 13.29 8.77 7.05 6.12 5.53 5.12 4.58 4.00 
01 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 

05 4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.09 

10 2.88 2.49 2.28 2.14 2.05 1.98 1.88 1:77 

20 1.72 1.70 1.64 1.60 1.57 1.54 1.50 1.45 

40 001 12.61 8.25 6.60 5.70 5.13 4ЛЗ 421 3.64 
01 7.31 5.18 4.31 3.83 3:51 3.29 2.99 2.66 

.05 4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 

.10 2.84 2.44 2:23 2.09 2.00 1.93 1.83 1:71 

20 1.70 1.68 1.62 1.57 1.54 1.51 1.47 1.41 

60 001 11.97 7.76 6.17 5.31 4.76 4.37 3.87 3.31 
01 7.08 4.98 4.13 3.65 3.34 3.12 2.82 2.50 

05 4.00 3,115 2.76 2252 2.37 2:25 2.10 1.92 

10 2.79 2.39 2.18 2.04 1.95 1.87 177 1.66 

20 1.68 1.65 1.59 1:55 1.51 1.48 1.44 1.38 

120 .001 11.38 7.31 5.79 4.95 4.42 4.04 3.55 3.02 
01 6.85 4.79 3.95 3.48 3.17 2.96 2.66 2.34 

05 3.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 

10 2:75 2:35 2.13 1.99 1.90 1.82 1.72 1.60 

.20 1.66 1.63 1:57 1.52 1.48 1.45 1.41 1.35 

со 001 10.83 6.91 5.42 4.62 4.10 3.74 3.27 2.74 
01 6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 

05 3.84 2.99 2.60 2.37 2.21 2.09 1.94 LTIS 

10 2.71 2.30 2.08 1.94 1.85 1.77 1.67 1.55 

20 1.64 1.61 1:55 1.50 1.46 1.43 1.38 1.32 


Source: Reproduced from E. F. Lindquist, Design and Analysis of Experiments in Psychology and 
Education, Houghton Mifflin, Boston, 1953, pp. 41-44, with the permission of the publisher. 


TABLE B.2 
Percentile Points of Studentized Range Statistic 


90th Percentile 
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number of groups 


df error 2 3 4 5 6 7 8 9 10 

1 8.929 13.44 16.36 18.49 20.15 21.51 22.64 23.62 24.48 
2 4.130 5.733 6.773 7.538 8.139 8.633 9.049 9.409 9.725 
3 3.328 4.467 5.199 5.738 6.162 6.511 6.806 7.062 7.287 
4 3.015 3.976 4.586 5.035 5.388 5.679 5.926 6.139 6.327 
5 2.850 3.717 4.264 4.664 4.979 5.238 5.458 5.648 5.816 
6 2.748 3.559 4.065 4.435 4.726 4.966 5.168 5.344 5.499 
7 2.680 3.451 3,931 4.280 4.555 4.780 4.972 5.137 5.283 
8 2.630 3.374 3.834 4.169 4.431 4.646 4.829 4.987 5.126 
9 2.592 3.316 3.761 4.084 4.337 4.545 4.721 4.873 5.007 
10 2.563 3.270 3.704 4.018 4.264 4.465 4.636 4.783 4.913 
11 2.540 3.234 3.658 3.965 4.205 4.401 4.568 4Л11 4.838 
12 2.521 3.204 3.621 3.922 4.156 4.349 4.511 4.652 4.716 
13 2.505 3.179 3.589 3.885 4.116 4.305 4.464 4.602 4.724 
14 2.491 3.158 3.563 3.854 4.081 4.267 4.424 4.560 4.680 
15 2.479 3.140 3.540 3.828 4.052 4.235 4.390 4.524 4.641 
16 2.469 3.124 3.520 3.804 4.026 4.207 4.360 4.492 4.608 
17 2.460 3.110 3.503 3.784 4.004 4.183 4.334 4.464 4.579 
18 2.452 3.098 3.488 3.767 3.984 4.161 4.311 4.440 4.554 
19 2.445 3.087 3.474 3.751 3.966 4.142 4.290 4.418 4.531 
20 2.439 3.078 3.462 3.736 3.950 4.124 4.271 4.398 4.510 
24 2.420 3.047 3.423 3.692 3.900 4.070 4.213 4.336 4.445 
30 2.400 3.017 3.386 3.648 3.851 4.016 4.155 4.275 4.381 
40 2.381 2.988 3.349 3.605 3.803 3.963 4.099 4.215 4.317 
60 2.363 2.959 3.312 3.562 3:755 3.911 4.042 4.155 4.254 
120 2.344 2.930 3.276 3.520 3.707 3.859 3.987 4.096 4.191 
со 2.326 2.902 3.240 3.478 3.661 3.808 3,931 4.037 4.129 
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TABLE 3.2 (Continued) 


95th Percentiles 


number of groups 


df error 2 3 4 5 6 7 8 9 10 
1 17.97 26.98 32.82 37.08 40.41 43.12 45.40 47.36 49.07 
2 6.085 8.331 9.798 10.88 11.74 12.44. 13.03 13.54 13.99 
3 4.501 5.910 6.825 7.502 8.037 8.478 8.853 9.177 9.462 
4 3.927 5.040 5.757 6.287 6.707 7.053 7.347 7.602 7.826 
5 3.635 4.602 5.218 5.673 6.033 6.330 6.582 6.802 6.995 
6 3.461 4.339 4.896 5.305 5.628 5.895 6.122 6.319 6.493 
7 3.344 4.165 4.681 5.060 5.359 5.606 5.815 5.998 6.158 
8 3.261 4.041 4.529 4.886 5.167 5.399 5.597 5.767 5.918 
9 3.199 3.949 4.415 4.756 5.024 5.244 5.432 5.595 5.739 
10 3.151 3.877 4.327 4.654 4.912 5.124 5.305 5.461 5.599 
11 3.113 3.820 4.256 4.574 4.823 5.028 5.202 5.353 5.487 
12 3.082 3.773 4.199 4.508 4.751 4.950 5.119 5.265 5.395 
13 3.055 3.735 4.151 4.453 4.690 4.885 5.049 5.192 5.318 
14 3.033 3.702 4.111 4.407 4.639 4.829 4.990 5.131 5.254 
15 3.014 3.674 4.076 4.367 4.595 4.782 4.940 5.077 5.198 
16 2.998 3.649 4.046 4.333 4.557 4.741 4.897 5.031 5.150 
17 2.984 3.628 4.020 4.303 4.524 4.705 4.858 4.991 5.108 
18 2.971 3.609 3.997 4.277 4.495 4.673 4.824 4.956 5.071 
19 2.960 3.593 3.977 4.253 4.469 4.645 4.794 4.924 5.038 
20 2.950 3.578 3.958 4.232 4.445 4.620 4.768 4.896 5.008 
24 2.919 3.532 3.901 4.166 4.373 4.541 4.684 4.807 4.915 
30 2.888 3.486 3.845 4.102 4.302 4.464 4.602 4.720 4.824 
40 2.858 3.442 3.791 4.039 4.232 4.389 4.521 4.635 4.735 
60 2.829 3.399 3.737 3.977 4.163 4.314 4.441 4.550 4.646 


120 2.800 3.356 3.685 3.917 4.096 4.241 4.363 4.468 4.560 
со 2.772 3.314 3.633 3.858 4.000 4.170 4.286 4.387 4.474 


TABLE B.3 
Critical Values for Dunnett’s Test 


Two-Tailed Comparisons 
k = number of treatment means, including control 


df Error а 2 3 4 3 6 7 8 9 10 
5 0.05 2.57 3.03 3.29 3.48 3.62 3.73 3.82 3.90 3.97 
0.01 4.03 4.63 4.98 3:22 541 5.56 5.69 5.80 5.89 
6 0.05 2.45 2.86 3.10 3.26 3.39 3.49 3.57 3.64 3.71 
0.01 3.71 4.21 4.51 471 4.87 5.00 5.10 5.20 5.28 
7 0.05 2.36 2.75 2.97 3.12 3.24 333 341 3.47 3.53 


0.01 3.50 3.95 4.21 4.39 4.53 4.64 4.74 4.82 4.89 
8 0.05 2.31 2.67 2.88 3.02 3.13 3.22 3:29 3.35 3.41 
0.01 3.36 3.77 4.00 4.17 4.29 4.40 4.48 4.56 4.62 


9 0.05 2.26 2.61 2.81 2.95 3.05 3.14 3.20 3.26 3.32 
0.01 3,25 3.63 3.85 4.01 4.12 4.22 4.30 4.37 4.43 

10 0.05 2.23 257 2.76 2.89 2.99 3.07 3.14 3.19 3.24 
0.01 3.17 3.53 3.74 3.88 3.99 4.08 4.16 4.22 4.28 

11 0.05 2.20 2.53 2.72 2.84 2.94 3.02 3.08 3.14 3.19 
0.01 3:11 3.45 3.65 3.79 3.89 3.98 4.05 4.11 4.16 

12 0.05 2.18 2.50 2.68 2.81 2.90 2.98 3.04 3.09 3.14 
0.01 3.05 3.39 3.58 3.71 3.81 3.89 3.96 4.02 4.07 

13 0.05 2.16 2.48 2.65 2.78 2.87 2.94 3.00 3.06 3.10 
0.01 3.01 3.33 3.52 3.65 3.74 3.82 3.89 3.94 3.99 

14 0.05 2.14 2.46 2.63 2.75 2.84 2.91 2.97 3.02 3.07 
0.01 2.98 3.29 3.47 3.59 3.69 3.76 3.83 3.88 3.93 

15 0.05 2.13 2.44 2.61 2.73 2.82 2.89 2:95 3.00 3.04 
0.01 2.95 3.25 3.43 3:55 3.64 3.71 3.78 3.83 3.88 

16 0.05 2.12 2.42 2.59 2.71 2.80 2.87 2.92 2.97 3.02 


0.01 2.92 3.22 3.39 3.51 3.60 3.67 3.73 3.78 3.83 
17 0.05 2.11 2.41 2.58 2.69 2.78 2.85 2.90 2.95 3.00 
0.01 2.90 3.19 3.36 3.47 3.56 3.63 3.69 3.74 3.79 


18 0.05 2.10 2.40 2.56 2.68 2.76 2.83 2.89 2.94 2.98 
0.01 2.88 3.17 3.33 3.44 3:53 3.60 3.66 371 3.75 
19 0.05 2.09 2.39 2.55 2.66 2.75 2.81 2.87 2.92 2.96 


0.01 2.86 3.15 331 3.42 3.50 3:57 3.63 3.68 3.72 
20 0.05 2.09 2.38 2.54 2.65 2.72 2.80 2.86 2.90 2.95 
0.01 2.85 3.13 3.29 3.40 3.48 3.55 3.60 3.65 3.69 


24 0.05 2.06 2.35 2.51 2.61 2.70 2.76 2.81 2.86 2.90 

0.01 2.80 3.07 3.22 3.32 3.40 3.47 3.52 3.57 3.61 

30 0.05 2.04 2.32 2.47 2.58 2.66 2.72 2:77 2.82 2.86 

0.01 2.75 3.01 3.15 3.25 3.33 3.39 3.44 3.49 3.52 

40 0.05 2.02 2:29 2.44 2.54 2.62 2.68 2.73 2.77 2.81 

0.01 2.70 2.95 3.09 3.19 3.26 3.32 3.37 3.41 3.44 

60 0.05 2.00 2.27 241 2.51 2.58 2.64 2.60 2.73 2.77 

0.01 2.66 2.90 3.03 2:12 3.19 3.25 3.29 3:33 3.37 

120 0.05 1.98 2.24 2.38 2.47 2.55 2.60 2.65 2.69 2.73 


0.01 2.62 2.85 2.97 3.06 3.12 3.18 3.22 3.26 3.29 
со 0.05 1.96 221 2.35 2.44 2.51 2:51 2.61 2.65 2.69 


0.01 2.58 2.79 2.92 3.00 3.06 3.11 3.15 3.19 3.22 


Reproduced from: C. W. Dunnett (1964). New tables for multiple comparisons with a control, 
Biometrics 20, 482-491. With permission of The Biometric Society. 
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410 APPENDIX B 


TABLE B.5 
Critical Values for Bryant-Paulson Procedure 
Number of Number of Groups 
Error | Covariates 
df (C) о 2 3 4 2 6 7 8 10 
3 1 05 5.42 7.18 8.32 9.17 9.84 10.39 10.56 11.62 
01 10.28 13.32 1532 16.80 17.98 18.95 1977 21.12 
2 05 6.21 8.27 9.60 10.59 1137 12.01 12.56 13.44 
01 1197 15.56 1791 1966 2105 2219 2316 24.75 
3 05 6.92 923 1073 1184 12.72 1344 14.06 15.05 
01 1345 17.51 2017 2215 2372 2501 2611 27.90 
4 1 05 4.51 5.84 6.69 7.32 7.82 8.23 8.58 9:15 
01 7.68 9.64 10.93 11.89 1265 13.28 13.82 1470 
2 05 5.04 6.54 7.51 8.23 8.80 9.26 9.66 10.31 
01 8.60 1095 1243 13.54 1441 1514 1576 16.77 
3 05 5.51 7.18 8.25 9.05 9.67 1019 10.63 11.35 
01 9.59 1211 1377 15.00 15.98 1679 17.47 18.60 
5 1 05 4.06 5.17 5.88 6.40 6.82 7.16 7.45 7.93 
01 6.49 7.99 8.97 970 1028 10.76 11.17 1184 
2 05 4.45 5.68 6.48 7.06 2552. 7.90 8.23 8.76 
01 7.20 8.89 9.99 10.81 1147 12.01 12.47 1323 
3 .05 4.81 6.16 7.02 7.66 8.17 8.58 8.94 9.52 
01 7.83 970 10.92 11.82 12.54 1314 13.65 1448 
6 1 05 3.79 4.78 5.40 5.86 6.23 6.53 6.78 7.20 
01 5.83 7.08 7.88 8.48 8.96 9.36 970 1025 
2 05 4.10 5.18 5.87 6.37 6.77 710 17.38 7.84 
01 6.36 7.15 8.64 9.31 9.85 1029 10.66 11.28 
3 05 4.38 5.55 6.30 6.84 7.28 7.64 794 8.44 
01 6.85 8.36 9.34 1007 1065 1113 11.54 12.22 
7 1 05 3.62 4.52 5.09 5.51 5.84 6.11 6.34 6.72 
01 5.41 6.50 7.20 7.72 8.14 8.48 8.77 9.26 
2 05 3.87 4.85 5.47 5.92 6.28 6.58 6.83 7.24 
01 5.84 7.03 7.80 8.37 8.83 9.21 9.53 10.06 
3 05 4.11 5.16 5.82 6.31 6.70 7.01 7.29 7.73 
01 6.23 7:52. 8.36 8.98 9.47 9.88 10.23 10.80 
8 1 05 3.49 4.34 4.87 5.26 5:57 5.82 6.03 6.39 
01 5.12 6.11 6.74 7.20 7.58 7.88 8.15 8.58 
2 05 370 461 3:19 5.61 5.94 6.21 6.44 6.82 
01 5.48 6.54 7:23 7.74 8.14 8.48 8.76 9.23 
3 05 3.91 4.88 5.49 5.93 6.29 6.58 6.83 7.23 
01 5.81 6.95 7.69 8.23 8.67 9.03 9:33 9.84 
10 1 05 3,32 410 4.58 4.93 5:21 5.43 5.63 5.94 
01 4Л6 5.61 6.15 6.55 6.86 7.13 7.35 7.72 
2 05 3.49 4.31 4.82 5.19 5.49 5.73 5.93 6.27 
01 5.02 5:98 6.51 6.93 7.27 7.55 7.79 8.19 
3 05 3.65 4.51 5.05 5.44 9:75 6.01 6.22 6.58 
01 5.27 6.23 6.84 7.30 7.66 7.96 8.21 8.63 


(Continued) 
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TABLE B.5 (Continued) 


Number of Number of Groups 
Error | Covariates 

df (C) о. 2 2 4 5 6 7 8 10 
12 1 05 3.22 3.95 4.0 4.73 4.98 319. 227 5.67 
01 454 5.31 5.79 6.15 6.43 6.67 6.87 7.20 
2 05 335 412 459 4.93 5.20 5.43 5.62 5.92 
01 4.74 5.56 6.07 6.45 6.75 7.00 7.21 7.56 
3 05 3.48 428 4.78 514 5.42 5.65 5.85 6.17 
:01 4.94 5.80 6.34 674 7.05 7.31 7.54 7.90 
14 1 05 3.15 3.85 428 4.59 4.83 5.03 520 5.48 
01 4.39 5.11 5.56 5.89 6.15 6.36 6.55 6.85 
2 05 3.26 3.9 444 476 5.01 522 540 5.69 
01 4.56 5.31 5.78 6.13 6.40 6.63 6.82 7.14 
9 .05 3.37 4413 4.59 4.93 5.19 5.41 5.59 5.89 
:01 4.72 5.51 6.00 6.36 6.65 6.89 7.09 7.42 
16 1 05 3.10 377 419 449 472 4.91 5.07 5.34 
:01 428 4.96 5.39 570 5.95 6.15 6.32 6.60 
2 05 3.19 3.90 432 4.63 4.88 5.07 524 5.52 
01 4.42 5.14 5.58 5.90 6.16 6.37 6.55 6.85 
3 05 320 401 4.46 4.78 5.03 5.23 5.41 5.69 
01 456 5.30 5.76 610 6.37 659 6.77 7.08 
18 1 05 3.06 372 412 441 4.63 4.82 4.98 5.23 
01 4.20 4.86 5.26 5.56 5.79 5.0 6.15 6.42 
2 05 3.14 382 424 454 477 4.96 543 5.39 
.01 4.32 5.00 5.43 2:3 5.98 6.18 6.35 6.63 
9 .05 3,23 3.93 4.35 4.66 4.90 510 527 5.54 
01 4.44 5.15 5.59 5.90 616 636 654 6.83 
20 1 05 3.03 367 407 4.35 457 4.75 4.90 5.15 
01 4.14 4.77 5.17 5.45 5.68 5.86 6.02 6.27 
2 05 3.10 377 417 446 469 4.88 5.03 5.29 
01 425 490 531 5.60 5.84 6.03 6.19 6.46 
3 05 3.18 386 428 4.57 4.81 5.00 5.16 5.42 
:01 4.35 5.03 5.45 3:79 5.99 6.19 6.36 6.63 
24 1 05 2.98 3.61 3.99 426 447 4.65 4.79 5.03 
01 4.05 4.65 5.02 5.20 5.50 568 5.83 6.07 
2 05 3.04 3.60 408 4.35 457 4.75 4.90 5.14 
:01 4.14 4.76 5.14 542 5.63 5.81 5.96 6.21 
3 .05 3.11 376 416 444 4.67 4.85 5.00 525 
:01 4.22 4.86 5.25 5.54 5.76 5.94 6.10 6.35 
30 1 05 2.94 3,55 3.91 4.18 4.38 454 469 4.91 
:01 3.96 4.54 4.89 514 5.34 5.50 5.64 5.87 
2 05 2.99 3.61 3.8 425 446 462 4.77 5.00 
01 403 462 4.98 524 544 5.61 5.75 5.98 
3 05 3.04 367 4.05 432 4.53 470 4.85 5.08 
01 410 4.70 506 533 5.54 5.71 5.85 6.08 
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TABLE B.5 (Continued) 


Number of Number of Groups 
Error | Covariates 

df (C) о 2 3 4 5 б 7 8 10 
40 1 05 2.89 3.49 384 409 429 445 458 4.80 
01 3.88 443 476 5.00 519 5.34 547 5.68 
2 05 2.93 3.53 3.80 415 4.34 450 464 4.86 
01 3.93 448 4.82 5.07 5.26 541 5.54 5.76 
3 05 2.97 3:57 394 420 440 456 470 492 
.01 3.98 4.54 4.88 5.13 5.32 5.48 5.61 5.83 
60 1 05 2.85 3.43 3:77 4.01 420 4.35 448 4.69 
01 379 432 464 486 5.04 5.18 5.50 5.50 
2 05 2.88 3.46 3.80 405 424 439 452 473 
01 3.83 4.36 4.68 4.90 5.08 5.22 5.35 5.54 
3 05 2.90 3.49 3.83 4.08 427 4.43 4.56 4.77 
01 386 439 472 4.95 5.12. 5:27 5.39 5:59 
120 1 05 2.81 3.37 3.70 3.93 4.11 4.26 438 4.58 
.01 372 422 452 4.73 4.89 5.03 514 5.32 
2 05 2.82 3.38 3.72 3.95 4.13 4.28 | 4.40 4.60 
01 3.79 4.204 4.54 4.75 4.91 5.05 516 5.35 
3 05 2.84 3.40 3.73 397 4.15 430 442 4.62 
01 35 4.25 4.55 477 | 494 5.07 5.18 5.37 


Source: Reproduced with permission of the trustees of Biometrika. 
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Table СЛ Power of F Test at о = .05, u= 1 
Table C.2 Power of F Test at а = .05, u= 2 
Table С.З Power of F Test at а = .05, u= З 
Table C.4 Power of F Test at а = .05, u= 4 
Table C.5 Power of F Test ata = .10, u=1 
Table C.6 Power of F Test ata = .10, u=2 
Table C.7 Power of F Test ata = 10, и= З 
Table C.8 Power of F Test ata = .10, и=4 


NOTES 


The quantity u refers to the degrees of freedom for the effect being tested. For a one 
way ANOVA with a levels we have и = a – 1. For a two way ANOVA with a levels 
for А and b levels for B, then и = (a — 1) for the A main effect, и = (b — 1) for the B 
main effect, and и = (а – 1)( b- 1) for the interaction effect. 

Group size is the assumed common number of subjects in each group. For two 
groups with unequal group sizes nı and m, use the harmonic mean 2п1п2 (пл + n2) 
to enter the table. For more than two groups, use the average group size to enter the 
table. 
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TABLE C.1 
Power of F Test at а = .05, u= 1 
f (effect size) 
Group 
Size 
n .05 10 AS .20 .25 30 35 40 50 60 .70 .80 
4 05 06 06 07 09 11 13 16 23 30 39 48 
5 05 06 07 08 11 13 16 20 29 39 50 61 
6 05 06 07 09 12 15 20 24 35 47 60 71 
7. 05 06 08 10 14 18 23 28 41 55 68 79 
8 05 06 08 11 15 20 26 32 47 62 75 85 
9 05 07 09 12 17 22 29 36 52 68 80 89 
10 05 07 09 13 18 25 32 40 57 73 85 93 
11 05 07 10 14 20 27 35 44 62 77 88 95 
12 05 07 10 15 22 29 38 47 65 81 91 97 
13 05 07 11 16 23 32 41 51 70 84 93 98 
14 05 08 11 17 25 34 44 54 73 87 95 98 
15 06 08 12 18 26 36 47 57 76 89 96 99 
16 06 08 12 19 28 38 49 60 79 91 97 99 
17 06 08 13 20 30 40 52 63 82 93 98 ы 
18 06 08 14 21 31 42 54 66 84 94 98 
19 06 09 14 22 33 44 57 68 86 95 99 
20 06 09 15 23 34 46 59 70 88 96 99 
22 06 09 16 26 37 50 63 75 91 97 
24 06 10 17 28 40 54 67 78 93 98 
26 06 10 18 30 43 58 71 82 95 99 
28 06 11 19 32 46 61 74 84 96 99 
30 06 11 21 34 49 66 77 87 97 
32 06 12 22 36 51 67 80 89 98 
34 07 12 23 38 54 69 82 91 98 
36 07 13 24 40 56 72 84 92 99 
38 07 13 25 41 59 74 86 94 99 
40 07 14 27 43 61 77 88 95 99 
44 07 15 29 47 65 80 91 96 
48 07 16 31 50 69 84 93 97 
52 08 17 33 53 73 87 95 98 
56 08 18 36 57 AD 89 96 99 
60 08 19 38 60 79 91 97 99 
64 08 20 40 62 81 93 98 % 
68 08 21 42 65 83 94 98 
72 09 22 44 68 85 95 99 
76 09 23 46 70 87 96 99 
80 09 24 48 72 89 97 99 
100 10 29 57 81 94 99 
140 13 39 72 92 99 
200 16 52 86 98 
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TABLE C.2 
Power of F Test at œ = .05, u= 2 
f (effect size) 
Group 
Size 
n .05 10 d3 .20 .25 .30 3 40 .50 .60 .70 .80 
4 05 06 06 08 09 11 14 17 24 33 44 54 
5 05 06 07 09 11 14 17 22 32 44 56 69 
6 05 06 07 10 13 16 21 26 39 53 67 79 
7 05 06 08 11 14 19 25 31 46 62 76 87 
8 05 06 08 12 16 22 28 36 53 69 83 92 
9 05 07 09 13 18 24 32 40 59 75 88 95 
10 05 07 10 14 20 27 35 45 64 81 91 97 
11 05 07 10 15 21 30 39 49 69 85 94 98 
12 06 07 11 16 23 32 42 53 74 88 96 98 
13 06 08 11 17 25 35 46 37. 77 91 97 99 
14 06 08 12 18 27 38 49 61 81 93 98 * 
15 06 08 13 20 29 40 52 64 84 95 99 
16 06 08 13 21 31 43 55 67 86 96 99 
17 06 09 14 22 33 45 58 70 89 97 99 
18 06 09 14 23 34 48 61 73 90 98 * 
19 06 09 15 24 36 50 64 76 92 99 
20 06 00 16 26 38 52 66 78 93 99 
22 06 10 17 28 42 57 71 82 96 99 
24 06 10 18 30 45 61 75 86 97 
26 06 11 20 33 48 65 79 89 98 
28 06 11 21 35 52 68 82 91 99 
30 06 12 22 37 55 71 85 93 99 
32 07 12 24 40 58 75 87 94 99 
34 07 13 25 42 61 77 89 96 
36 07 13 26 44 63 80 91 97 
38 07 14 28 46 66 82 92 97 
40 07 15 29 48 68 84 94 98 
44 07 16 32 53 73 88 96 99 
48 08 17 34 57 77 90 97 99 
52 08 18 37 60 80 93 98 
56 08 19 40 64 83 94 99 
60 08 21 42 67 86 96 99 
64 08 22 45 70 88 97 99 
68 09 23 47 73 90 98 * 
72 09 24 49 75 92 98 
76 09 25 52 78 93 99 
80 09 27 54 80 94 99 
100 11 32 64 88 98 
140 14 44 79 97 
200 18 59 92 
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TABLE C.3 
Power of F Test at a = .05, и= З 
f (effect size) 
Group 
Size 
n .05 10 AS .20 25 30 35 40 50 .60 .70 .80 
4 05 06 07 08 10 12 15 18 27 38 50 62 
5 05 06 07 09 12 15 19 24 36 50 64 76 
6 05 06 08 10 13 18 23 29 44 60 75 86 
т 05 06 08 11 15 21 27 35 52, 69 83 92 
8 05 07 09 12 17 24 31 40 59 77 89 96 
9 05 07 09 14 19 27 36 46 66 82 93 98 
10 05 07 10 15 21 30 40 51 71 87 96 99 
11 06 07 11 16 24 33 44 55 76 91 97 99 
12 06 08 11 17 26 36 48 60 81 93 98 2 
13 06 08 12 19 28 39 52 64 84 95 99 
14 06 08 13 20 30 42 55 68 87 97 99 
15 06 08 13 21 32 45 59 71 90 98 m 
16 06 09 14 23 34 48 62 75 92 98 
17 06 09 15 24 37 51 65 78 94 99 
18 06 09 16 26 39 53 68 80 95 99 
19 06 09 16 27 41 56 71 83 96 99 
20 06 10 17 28 43 59 73 85 97 
22 06 10 18 31 47 63 78 88 98 
24 06 11 20 34 51 68 82 91 99 
26 06 11 22 37 54 72 85 94 99 
28 07 12 23 39 58 75 88 95 
30 07 13 25 42 61 79 90 96 
32 07 13 26 45 65 81 92 97 
34 07 14 28 47 68 84 94 98 
36 07 14 29 50 70 86 95 99 
38 07 15 31 52 13 88 96 99 
40 07 16 32 54 76 90 97 99 
44 08 17 35 59 80 93 98 
48 08 18 39 63 84 95 99 
52 08 20 42 67 87 96 99 
56 08 21 45 71 89 97 
60 09 22 47 74 91 98 
64 09 24 50 77 93 99 
68 09 25 53 80 95 99 
72 09 27 56 82 96 99 
76 10 28 58 84 97 Ы 
80 10 29 61 86 97 
100 11 36 71 93 99 
140 14 49 86 99 
200 19 66 96 
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TABLE C.4 
Power of F Test at a = .05, u = 4 
f (effect size) 
Group 
Size 
n .05 10 AS .20 29 30 3 .40 .50 .60 .70 .80 
4 05 06 07 08 10 13 16 20 30 42 56 69 
5 05 06 07 09 12 16 21 26 40 55 70 83 
6 05 06 08 10 14 19 25 32 49 66 81 9] 
7 05 06 09 12 16 22 30 39 58 76 88 96 
8 05 07 09 13 19 26 35 45 65 83 93 98 
9 05 07 10 14 21 29 40 51 72 88 96 99 
10 06 07 10 16 23 33 44 56 78 92 98 * 
11 06 08 11 17 26 37 49 61 82 94 99 
12 06 08 12 19 28 40 53 66 86 96 99 
13 06 08 13 20 31 43 57 70 89 98 * 
14 06 08 13 22 33 47 61 74 92 98 
15 06 09 14 23 36 50 65 78 94 99 
16 06 09 15 25 38 53 68 81 95 99 
17 06 09 16 26 40 56 71 83 96 % 
18 06 09 17 28 43 59 74 86 97 
19 06 10 17 30 45 62 71 88 98 
20 06 10 18 31 47 65 79 90 99 
22 06 11 20 34 52 69 84 93 99 
24 06 11 22 37 56 74 87 95 
26 07 12 23 40 60 78 90 96 
28 07 13 25 43 64 81 92 98 
30 07 13 27 46 67 84 94 99 
32 07 14 29 49 gH 87 96 99 
34 07 15 30 52 74 89 97 
36 07 15 32 55 76 9] 97 
38 07 16 34 57 79 92 98 
40 07 17 36 60 81 94 99 
44 08 18 39 65 85 96 99 
48 08 20 43 69 9] 97 Ж 
92 08 21 46 73 93 98 
56 09 23 49 77 95 99 
60 09 24 52 80 96 99 
64 09 26 55 83 97 * 
68 09 28 58 85 98 
72 10 29 61 87 98 
76 10 31 64 89 99 
80 10 32 66 91 
100 12 40 77 96 
140 15 54 91 99 
200 20 72 98 
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TABLE C.5 
Power of F Test ata = .10, u= 1 
f (effect size) 
Group 
Size 
n .05 10 AS .20 .25 30 35 40 50 60 .70 .80 
4 10 11 13 14 17 20 23 27 36 45 55 64 
5 10 11 13 16 19 23 27 32 43 55 66 76 
6 10 12 14 17 21 26 31 37 50 63 74 83 
T 10 12 15 19 23 29 35 42 56 69 80 89 
8 10 12 15 20 25 32 39 47 62 75 85 92 
9 10 13 16 21 28 35 43 51 66 80 89 95 
10 10 13 17 23 30 37 46 55 71 83 92 97 
11 11 13 18 24 32 40 49 58 75 87 94 98 
12- Ji 14 19 25 34 43 52 62 78 89 96 99 
13 11 14 19 27 36 45 55 65 81 91 97 99 
14 11 14 20 28 37 48 58 68 83 93 98 99 
15 11 15 21 29 39 50 60 70 86 95 98 m 
16 11 15 22 31 41 52 63 73 88 96 99 
1711 15 23 32. 43 54 65 75 89 97 99 
18 11 16 23 33 45 56 68 77 91 97 99 
19 11 16 24 34 46 58 70 79 92 98 * 
20 11 16 25 36 48 60 72 81 93 98 
22 11 17 26 38 51 64 75 84 95 99 
24 12 18 28 40 54 67 78 87 96 99 
26 12 19 29 43 57 70 81 89 97 * 
28 12 19 31 45 60 73 84 91 98 
30 12 20 22 47 62 76 86 93 99 
32 12 21 34 49 65 78 89 94 99 
34 12 21 35 51 67 80 90 95 99 
36 13 22 36 53 69 82 91 96 * 
38 13 23 38 55 71 84 92 97 
40 13 24 39 57 73 85 93 97 * 
44 13 25 42 60 TI 88 95 98 
48 14 26 44 63 80 91 96 99 
52 14 28 47 66 82 92 97 99 
56 14 29 49 69 85 94 98 * 
60 15 30 51 72, 87 95 99 
64 15 31 53 74 89 96 99 
68 16 33 56 76 90 97 99 
72 16 34 58 78 92 98 99 
76 16 35 59 80 93 98 Т 
80 17 36 61 82 94 99 
100 18 42 70 89 97 
140 22 53 82 96 99 
200 27 65 92 99 
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TABLE C.6 
Power of F Test ata = 10, u= 2 
f (effect size) 
Group 
Size 
n .05 10 d3 .20 .25 30 3 40 .50 .60 .70 .80 
4 10 11 13 15 17 20 24 28 38 48 59 70 
5 10 12 13 16 20 24 29 34 46 59 71 81 
6 10 12 14 18 22 27 33 40 54 68 80 89 
7 10 12 15 19 24 30 37 45 61 75 86 93 
8 11 13 16 21 27 34 41 50 67 81 90 96 
9 11 13 17 22 29 37 45 55 72 85 94 98 
10 11 13 18 24 31 40 49 59 76 89 96 99 
11 11 14 18 25 33 43 53 63 80 92 97 99 
12 11 14 19 27 36 46 56 67 84 94 98 ы 
13 11 14 20 28 38 49 60 70 86 95 99 
14 11 15 21 30 40 51 63 73 89 97 99 
15 11 15 22 31 42 54 66 76 91 97 Е 
16 11 16 23 32 44 56 68 79 92 98 
17 11 16 24 34 46 59 71 81 94 99 
18 1l 16 24 35 48 61 73 83 95 99 
19 1l 17 25 37 50 63 75 85 96 99 
20 12 17 26 38 52 65 77 87 97 * 
22 12 18 28 41 55 69 81 90 98 
24 12 19 29 43 59 73 84 92 99 
26 12 19 31 46 62 76 87 94 99 
28 12 20 33 48 65 79 89 95 99 
30 12 21 34 51 68 82 91 96 
32 13 22 36 53 70 84 93 97 
34 13 22 37 55 73 86 94 98 
36 13 23 39 57 75 88 95 98 
38 13 24 40 60 TI 89 96 99 
40 13 25 42 62 79 9] 97 99 * 
44  ]4 26 45 65 82 93 98 
48 14 28 48 69 85 95 99 
52 15 29 50 72 88 96 99 
56 15 31 53 75 90 97 99 
60 15 32 55 78 92 98 
64 16 33 58 80 93 98 
68 16 35 60 82 95 99 
72 17 36 62 84 96 99 
76 17 38 65 86 96 99 
80 17 39 67 88 97 * 
100 19 45 19 93 99 
140 23 57 87 98 
200 29 71 96 
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TABLE C.7 
Power of F Test ata = 10, u=3 
f (effect size) 
Group 
Size 
n .05 10 AS .20 25 30 35 40 50 .60 .70 .80 
4 10 11 13 15 18 21 25 30 41 53 65 76 
5 10 12 14 17 20 25 30 37 50 64 77 87 
6 10 12 15 18 23 29 35 43 59 73 85 93 
7 11 12 15 20 26 32 40 49 66 81 91 96 
8 11 13 16 22 28 36 45 54 72 86 94 98 
9 11 13 17 23 31 40 49 59 78 90 97 99 
10 11 14 18 25 33 43 54 64 82 93 98 * 
п 1l 14 19 27 36 46 58 68 86 95 99 
12 11 14 20 28 38 50 61 72 89 97 99 
13 11 15 21 30 41 53 65 76 91 98 * 
14 11 15 22 31 43 56 68 79 93 98 
15 11 16 23 33 45 59 71 82 95 99 
16 11 16 24 35 48 61 74 84 96 99 
17 11 16 25 36 50 64 77 86 97 Ы 
18 ll 17 26 38 52 66 79 88 98 
19 12 17 27 39 54 69 81 90 98 
20 12 18 28 41 56 ЖА: 83 91 99 
22 12 18 29 44 60 75 86 94 99 
24 12 19 31 47 64 79 89 95 + 
26 12 20 33 50 67 82 91 97 
28 12 21 35 53 70 84 93 98 
30 13 22 37 35 73 87 95 98 
32 13 23 39 58 76 89 96 99 
34 13 23 40 60 78 91 97 99 
36 13 24 42 63 81 92 98 99 
38 14 25 44 65 83 93 98 * 
40 14 26 45 67 84 94 99 
44 14 28 49 71 88 96 99 
48 15 29 52 75 90 97 $ 
52 15 31 55 78 92 98 
56 15 33 58 81 94 99 
60 16 34 60 83 95 99 
64 16 36 63 85 96 99 
68 17 37 66 88 97 ia 
72 17 39 68 89 98 
76 17 41 70 91 98 
80 18 42 72 92 99 
100 20 49 81 96 
140 24 62 92 99 
200 30 yi 98 
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TABLE C.8 
Power of F Test ata = .10, и=4 
f (effect size) 
Group 
Size 
n .05 10 d3 .20 .25 .30 39 40 .50 .60 .70 .80 
4 10 11 13 15 18 22 27 32 44 37 70 81 
5 10 12 14 17 21 26 32 39 54 69 82 91 
6 10 12 15 19 24 31 38 45 63 79 89 96 
7 11 13 16 21 27 35 43 53 71 85 94 98 
8 11 13 17 23 30 39 48 59 77 90 97 99 
9 11 13 18 24 33 43 53 64 82 94 98 * 
10 11 14 19 26 36 47 58 69 87 96 99 
11 11 14 20 28 38 50 62 73 90 97 * 
12 11 15 21 30 41 54 66 77 92 98 
13 11 15 22 32 44 57 70 81 94 99 
14 11 16 23 34 46 60 73 84 96 99 
15 11 16 24 35 49 63 76 86 97 * 
16 11 16 25 37 51 66 79 88 98 
17 1l 17 26 39 54 69 81 90 98 
18 12 17 27 4l 56 71 84 92 99 
19 12 18 28 42 58 74 86 93 99 
20 12 18 29 44 61 76 87 94 99 
22 12 19 31 47 65 80 90 96 
24 12 20 33 51 69 83 93 97 
26 12 21 35 54 72, 86 95 98 
28 13 22 37 57 75 89 96 99 
30 13 23 39 60 78 91 97 99 
32 13 24 41 62 81 92 98 * 
34 13 25 43 65 83 94 98 
36 14 26 45 67 85 95 99 
38 14 26 4T 70 87 96 99 
40 14 27 49 72 89 97 99 * 
44 14 29 52 76 91 98 
48 15 31 56 79 94 99 
52 15 33 59 83 95 99 
56 16 35 62 85 96 ж 
60 16 37 65 88 97 
64 17 38 68 90 98 
68 17 40 70 91 99 
72 18 42 73 93 99 
76 18 44 75 94 99 
80 19 45 77 95 * 
100 21 53 86 98 
140 25 67 95 
200 32 82 99 
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Answers to Selected Exercises 


CHAPTER 1 


1. (a) The $22,000 figure was misleading because of the extreme salaries of 
$70,000 and $250,000. Recall that the mean is very sensitive to extreme 
values. 

(b) The median should have been used. It is essentially unaffected by ex- 
treme values. The median for this set of data is $15,000, and indicates 
where most of the salaries are concentrated. 


3. She should not be concerned, since considerable research has shown that a 
violation of the normality assumption has little effect on the Type I error 
rate. 


5. (a) Ley = ХЗх = Ух = 3(5 +8 +1+7) = 63 
(b) First, note that the mean for the scores cxi, cx», . . ., cx, is given by 


_ CX T сх + + СХа ЕЕ с(х| +22 ++ Xn) а 
п n 


Хех 


where x is the mean for x1, Х2,..., Xn- 
Now, using the definitional formula for variance, we have 


52, = Х(сх; —cx)? _ X[e(x;-x)P | Ec?(x;—-x? «<25(х;- х)2 
| п—1 n—i n—l п—1 


= с252, as was to ђе proved. 


(c) The grand mean for the groups is 6.3. Therefore, 


X10(X; -x) = X10(x; — 6.3)? 
= 10[(4.1— 6.3? + (8.5 —6.3)?] = 96.8 


7. (a) The correlation for all 14 data points is .587, indicating a moderate re- 
lationship between height and weight. 
(b) The outlier is subject 10, whose weight of only 115 Ibs is very unusual 
for someone over 6 ft tall. 
(c) The correlation without subject 10 is now .867, indicating that there is 
indeed a strong relationship between height and weight. 
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. (a) The 95% confidence interval for the first study is given by 4 + 2.101 


(.57), or (2.8, 5.2), while the 95% confidence interval for the second study 
is given by 4 + 1.96(1.74), or (.59, 7.41). The null hypothesis of equal pop- 
ulation means is rejected in both cases, since 0 is not in either interval. 
(b) We can be confident of clinical significance in the first study since the 
interval is indicating that the difference in the population means is at least 
2.8, that is, greater than 2. We cannot be confident of clinical significance 
in the second study since the interval indicates the population mean differ- 
ence could be as small as .59. 


CHAPTER 2 


. Overall œ = 1 – (1—.05)?! = 1–.34 = .66 


. (a) а = 2, dfw = 27. Therefore, the critical value at the .05 level is 3.35. 


(b) df, = 3, dfw = 76. Therefore, the critical value at the .10 level is 2.17. 
(c) dfe = 4, dfw = 35. Therefore, the critical value at the .01 level is 3.9. 


. Using a calculator, we obtain first the means and variances for each group: 


GROUP!  GROUP2  GROUP3 GROUP4 


хі 4 9 4.8 6.5 
sp 3.33 4 3.7 3 


Now, sum of squares within is given by 


SSw = (m — Ds? + (ro — Ds? +++ + (пе —Dsz 
= 3(3.33) + 2(4) + 4(3.7) + 3(3) = 41.79 
MS,, = SS, КМ —k) = 41.79/12 = 3.48 
SSp = 4(4—5.81)? + 3(9— 5.81)? + 5(4.8 — 5.81)2 + 4(6.5 — 5.81)? 
= 50.64 
MSp = SS, /(k —1) = 50.64/3 = 16.88 


Therefore, F = MSj,/MS,, = 16.88/3.48 = 4,85 
The critical value at the .05 level is 3.49. Thus, we reject and conclude there 
is an overall difference among the groups. 


7. Recall that the formula for obtaining the intervals is 


(Xi —xj)c GQo;k;N—kN MS, /n, 


11. 


13. 
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where 4 is studentized range statistic (Table D), MS,, is the error term from 


the ANOVA, and п is the common number of subjects per group. 


дам М8, / n = 3.312\/22.35/15 = 4.043 


Now, we obtain the confidence intervals: 


CRITICAL CONFIDENCE 
GROUPS VALUE INTERVALS 
Х-Ха--1Л 4.043 (-5.743,2.343) 
x1 Х3 = 1.5 4.043 ( — 2.543, 5.543) 
Х-Ха--341 4.043 ( — 7.143, 943) 
X2—X3232 4.043 (—.843, 7.243) 
Хр -X4=-14 4.043 (– 5.433, 2.643) 
X3 —X4 2 —4.6 4.043 ( — 8.643, —557) 


Since the intervals for the first 5 paired comparisons all cover 0, none of 
these are significant. Only the last paired comparison is significant, since 
that interval does not cover 0, that is, 0 is not a likely value for u3 - u4. 


The estimate of the contrast is 


1» =(5.6+7.3)/2—(8.1+4.2)/2 = 30 
Ec? In; = (59 110 + (.5)2 /8+(—.5)2 /11+(—.5)2 /13 
= .098 
2 
_ (30)? /.098 _ 
87 


F 1 


Since we have more error variation than effect variation, the contrast is 
clearly not significant. 


O’Grady’s statement relates to the restriction of range phenomenon you 
encountered when studying the Pearson correlation in your introductory 
statistics course. In this case there would undoubtedly be the least amount 
of variance in heart efficiency to account for in a population of runners 
(more homogeneous), while a random sample of the American adult popu- 
lation is much more heterogeneous and therefore the potential of account- 
ing for more variance. 


The form of the control lines for running the analysis is identical to that 
presented in the chapter. To obtain both the Scheffe and Tukey intervals in 
one run simply insert in the MEANS statement: 
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MEANS REGION/SCHEFFE TUKEY; 


where REGION is the name I have given to the grouping variable. 

(a) There is a significant overall difference at the .05 level, since from the 
printout we have F = 3.38, p = .0285. 

(b) There are no significant pairwise differences found with the Scheffé 
procedure, while a significant pairwise difference, between Groups 1 and 
4, is found with the Tukey procedure. 

(c) The Scheffé is a more conservative procedure than Tukey, and is not as 
powerful for detecting pairwise differences. 


(a) The 5 comparisons are given schematically: 


KEYWORD  EXPERIENTAL PICTURE CONTROL 
11 1 0 0 -1 
І2 1 0 -1 0 
13 1 -1 0 0 
ГА 0 0 1 -1 
LS 0 1 0 -1 


(b) First of all, note that the set of 5 comparisons must have dependencies 
since there are only 3 degrees of freedom between, and hence at most 3 in- 
dependent comparisons. If we compute the sum of products for L1 and L2 
we find 


101) + 0(0) + 0(-1) + (-1)(0) = 1 


Therefore, there is a dependency for L1 and 12. 
(c) Since the group sizes are equal (и = 16), the error term (MS,,) for each 
contrast is simply the average of the group variances. Therefore, 


_ (22.9)? + (27)? + (23.1? + (25.6)2 
4 


Recall from page 68 that if the group sizes are equal, then the F statistic for 
testing a contrast for significance is 


MS, — 610.6 


Р nl? | хе? 
MS,, 
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Keyword Versus Control 


| 16(72.3—48.7)2 /2 
Е 610.6 


= 7.2972 => t = 2.701 


Keyword Versus Picture 


| 16(72.3—42.4)2 /2 
Е 610.6 


Е = 11.713 > t = 3.42 


Keyword Versus Experiental 


| 16(72.3 362) /2 
610.6 


= 17.074 = T = 4.132 
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(a) The null hypothesis is ш = Из = Из. It is rejected at the .10 level since F 


= 3.115 and р = .053. 


(b) The Levene test is not significant at the .05 level, since р = .766. 


(с) For the Tukey procedure at the .10 level, only Groups І and 3 are sig- 


nificantly different. 
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SELECTED PRINTOUT FROM SPSS FOR WINDOWS 


Insert art from p. 404 of previous edition. (x 41.5 pi) 


19. 
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(a) 1- (12 0)3 =1– (1– 30/+ 3a? – 003) = 3o – 307? + 03 
(b) Зо“ = 3(.01) = .03 
За/ – 302 + 0/3 = 3(.01) – 3(.01)2 + (.01)3 = .0297 


What we have shown is that the two quantities are approximately the same 
for small о. 


CHAPTER 3 


F (He true) 


F (H, false) 


2.28 4.51 


We have shown that as Type I error decreases (from light shaded area to 
dark shaded), Type П error increases (from boldfaced lined area to the total 
lined area). 


. In doing a two-tail test, say at .05, the alpha level is divided into two equal 


portions of .025. Thus, in effect we are working at a more severe alpha 
level, and therefore we will have less power for the two tailed test. 


. Using the formula for effect size, d= AO / nj +1/ по), we obtain 
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(a) 


Study а; 


о сол & ы — 
oo 
© 


(b) The vast majority of the studies (8 of 10) show medium to large effect 
sizes, which would undoubtedly be of practical significance. Yet in 7 of the 
8 cases, significance was not found because of a power problem (small to 
very small group sizes). There is systematic evidence to document the su- 
periority of the combined treatment. 


. To estimate power we first find the estimated effect size f, from 


f=V(k-DFIN —44(203)/125 =255 
Now, using Table C.4, with f= .25 and n = 25, we find that power = .58. 
To obtain power at о = .10, we use Table C.8 and find that power = .71. 


. (А) f = /3(5.61) / 800 —145. Now, using Table С.З with f= .15 and an av- 


erage group size of 200 we find that power = .96. 

(b) These results do not appear to have any practical significance. First, 
the effect size is small. Secondly, look at the size of the mean differences 
(for a scale which has a range from 10 to 50). The mean differences for all 
pairs of groups, except Jewish and Protestant 2, are about 2 or less. These 
are trivial differences on a scale with a range of 40. 


Below is the PASS 6.0 printout. 

Power does not become adequate for any of the sample sizes. Note that at 
the .05 level the power is only .34 even with 120 subjects per group! Also, 
even at the .10 level, the power is just .4626 with 120 subjects per group. 
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SELECTED PRINTOUT FROM PASS 6.0 


Insert art from p. 406 of previous edition 


CHAPTER 4 


1. A fourth advantage of a factorial design is that it can help to increase the 
generalizability of results. For example, suppose we had compared 3 treat- 
ments in a one way ANOVA and found a significant difference. Someone 
then says to us that the relative efficacy of treatments might depend on the 
sex of the subjects, and we run a factorial ANOVA (sex by treatments) to 
check this out. If we had adequate power and the interaction effect is not 
significant, we can generalize our results. 


3. The control lines for running Problem 2 are as follows: 


DATA TWOWAY; 
INPUT AGE TREAT DEP @@; 


CARDS; 

1122 1127 1123 1128 1120 
1224 1232 1230 1235 1232 
1 3.19 1330 1327 2320 1 3 21 
2118 2125 2127 2120 2 1 23 
2224 2216 2218 2219 22 20 
23.34 23 28 2321 2 3 30 2 3 29 
PROC PRINT; 

PROC GLM; 


CLASS AGE TREAT; 
MODEL DEP = AGE TREAT AGE *TREAT; 
MEANS AGE TREAT AGE*TREAT; 
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Selected printout from the above run is given below: 


SOURCE DF TYPE I SS F VALUE PR > F TYPE III SS 
AGE 1 45.63333 2.82 ‚1063 45.63333 
ТКЕАТ 2 37.80000 ШЕМ; .3284 37.80000 
AGE* TREAT 2 334.06667 10.31 .0006 334.06667 


Notice that the Type I and Type III sums of squares are the same, as they al- 
ways will be for equal cell size factorial ANOVA. The F values will also be 
the same, and are not repeated twice here. If we had decided to test each ef- 
fect at the .05 level, then only the interaction effect is significant, since only 
that p value is less than .05. 


5. (a) fa = 42: 557/24 —681 


The л needed to enter the table is 


пв = [ОМ —rc)/c]--1—- [Q4 —6)/3]-1— 7 


Now, using Table С.б and f = .70 (since our estimated effect size is very 
close to this value), we find that power is .86. Actually, if we had interpo- 
lated power would be slightly less, but the main point here is that power for 
detecting the reinforcement main effect was quite good. 


(b) fag = У2:1.87/24 =395 


The п needed to enter the table is 


NAB = 


(N —rc) ШШЕ. 2279 
(r-D(c-D-1 (2—1)(3—1)+1 


(с) Given that Pukulski had less than а 50% chance of detecting a large in- 
teraction effect, the study should be replicated with larger sample size for 
more adequate power. Using Table C.6, we see that for f= .40, power is 
only .45. 


7. (a) For each dependent variable in a two-way design, there are 3 statistical 
tests (2 main effects and interaction effects). Since 5 two-way ANOVAs 
were done, this means 5(3) = 15 statistical tests were done. 

(b) Upper bound on overall о is 1 — (1—.05)!5 = 1–.463 = .537. 
(c) The investigator should be quite cautious, since the probability of at 
least a few spurious rejections is very high. 
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9. (a) We present below the rearranged means along with the row, column, 


11. 


and grand means, and the estimated interaction effects: 


TREATMENT ROW MEANS 
6.5 7.8333 8.3333 7.5556 

(-.1667) (5) (-.3333) 
АСЕ 

8.8333 8.8333 11 9.5556 

(.1667) (-.5) (.3333) 
COLUMN 7.6667 8.3333 9.6667 8.5556 (GRAND 
MEANS MEAN) 


SS = 6[2(.0278) + 2(.25) + 2(.1 111)] = 6(.7778) = 4.6668 


(b) We follow the same process as above for calculating the sex by age 
sum of squares: 


AGE ROW MEANS 

7.5556 9.6667 8.6112 

(-.0556) (.0556) 
SEX 

7.5556 9.4444 8.5000 

(.0556) (-.0556) 
COLUMN 7.5556 9.5556 8.5556 (GRAND 
MEANS MEAN) 


SS = 9[4(.00309)] = .1112 
(с) SSage = 18 [(7.5556-8.5556)? + (9.5556–8.5556)2] = 36 


The п to enter Cohen’s tables for the treatment main effect is given by n = 
[(90—9)/3] + 1 = 28. The power at .05 15.52 and power at .10 is .65 (here и = 
2, since there аге 2 df for treatment). Thus, power is still not quite adequate, 
even at the .10 level of significance. For the interaction effect, the to enter 
the table is n = [(90—9)/(4 + 1)] + 1 = 17 (approx.). The degrees of freedom 
for interaction here is (3-1)(3-1) = 4, which recall is u in Cohen's tables. 
Thus, power = .40 at .05 and .54 at .10. Assuming power would increase 
roughly by the same amount (.14), in going from a= .10 to 0 = .15, we esti- 
mate that power would be about .68 at a = .15. 
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(a) The significant dynamism and warmth and acceptance interaction ef- 
fects indicate that the principals rate more versus less effective teachers dif- 
ferentially on each of this traits, depending on the level of schooling. 

(b) The cell means for dynamism and warmth and acceptance are 


DYNAMISM WARMTH & ACCEPT. 

MORE EFF LESS EFF MORE EFF LESS EFF 
ELEM 25,7 28.9 39.3 23.9 
INTERM 27.9 22.8 35.6 26.5 
SENIOR 28.2 17.6 31.7 26.7 


Note that the means for dynamism increased (for the more effective 
teachers) as the school level of the principal increases, and decreased for 
the less effective teachers, as the authors hypothesized. Recall that an inter- 
action can be thought of as a difference in the differences, and here those 
differences are — 3.2, 5.1, and 10.6. 

Regarding warmth and acceptance, the difference in means for more and 
less effective teachers is sharpest for the elementary principals and de- 
creases in size as the level of the principal increases (as the authors had hy- 
pothesized). The differences are 15.4, 9.1, and 5. 

(c) Basically half of their hypotheses were confirmed, that is, that warmth 
and acceptance and dynamism would be important in distinguishing more 
versus less effective teachers. However, they also hypothesized that cre- 
ativity would discriminate for elementary school principals, while orga- 
nized demeanor would be important for intermediate and high school prin- 
cipals, and neither of these was confirmed. As a matter of fact, the 
discrepancy between more versus less effective teachers on creativity is 
sharpest for the senior-level principals. On organized demeanor the means 
for elementary, intermediate, and high school principals are 34.8, 36.8, and 
36.3. 

(d) To check which pairs of means on organized demeanor are signifi- 
cantly different one should use the Tukey procedure. You will find that 
there are no significant differences. 


(a) From the printout the following effects are significant at the .01 level: 
SEX (F = 22.067, p = .000), TREAT (F = 8.685, p = .001) and SEX * 
TREAT (F = 6.034, p = .005) 

(b) From the marginal means, it is clear that males (if males are coded as 1) 
did better than females, and that Treatment 2 was the best (assuming higher 
is better). However, the interaction tells us that things are more compli- 
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cated, and an examination of the SEX*TREAT means reveals that males do 
particularly well with Treatment 2. 


SELECTED PRINTOUT FROM SPSS FOR WINDOWS 12.0 


Insert art from p. 411 of previous edition 
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Insert art from p. 412 of previous edition 
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17. Below is selected printout from SPSS for Windows 12.0. From the table we 
can see that none of the effects are significant at the .05 level. 


Insert art from p. 413 of previous edition 
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CHAPTER 5 
| 
UNIVARIATE REPEATED MEASURES ANALYSIS 
TREATMENTS ROW 
1 2 3 4 MEANS 
1 5 6 2 5 4.5 
2 3 4 1 6 3.5 
3 3 7 4 10 6.0 
Ss 4 6 8 3 3 5.0 
5 4 9 7 8 7.0 
6 5 7 4 9 6.25 
7 2 10 1 2 3.75 
8 4 3 2 5 3.50 
4 6.75 3 6 4.9375 (GRAND MEAN) 


SS, = 8[(4 — 4.9375)? + (6.75 — 4.9375)? + (3 — 4.9375)? + (6 — 4.9375)? ] 
= 72.374 
MS, = 72.37413 = 24.125 


SSy= 12 + 395 + 28 + 56 
sum of squares for Treat 1 Treat 2 Treat 3 Treat 4 
SS, = 135.5 


SUM OF SQUARES FOR BLOCKS 


SS, = 4[(4.5 — 4.9375)? + (3.5 — 4.9375)? +--+ (3.5 — 4.9375)? ] 
= 51.375 
SSres = 135.5 — 51.375 = 84.125 
М5, = 84.125/21 = 4.006 
Е =24.125/4.006 = 6.022 


(a) The critical value at the .05 level, on 3 and 21 degrees of freedom, is 
3.07. Therefore, we have a significant overall difference. 

(b) Tukey post hoc procedure—The critical value against which each 
mean difference is to be compared is 


q05:421N MS yes Їп = 3.95/4.006/8 = 2.795 
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The only mean differences that exceed 2.795 in absolute value are for 
Groups 2 and 3, and Groups 3 and 4. Thus, these are the only pairs of 
groups that are significantly different at the .05 level with the Tukey proce- 
dure. 


. (a) First, we compute the basic quantities that are to be plugged into the 


formula: 


Sij = (76.8 + 42.8 + 64)/3 = 61.2 


This is the average of the diagonal elements. 


5 = (76.8 + 53.2 +69 + 53.2 +---+47+64)/9 = 58 


This is the average of all the elements in the covariance matrix. 


Es; = 76.82 + 53.22 + 69? +---+47? + 64? = 31426.56 
This is just the sum of all the squared elements in the matrix. 


(b) s; = these are the row averages 
51 = (76.8) + 53.2 + 69)/3 = 66.333 
52 = (53.2 + 42.8 + 47)/3 = 47.667 
53 = (69 + 47 + 64)/3 = 60 


9(61.2 — 58)? 
2[31426.56 — 6(10272.21) + 9(3364)] 


Е- 


(c) Recall that the min [??] = 1/(k — 1), where k is the number of levels for 
the repeated measures factor. Since k = 3 here, it follows that min € = 
1(3-1) = .50. 

(d) For the design, there were two groups, with eight subjects per group, 
and five repeated measures. Thus we have g = 2, n = 8 and k = 5. Therefore, 


8(2)(4)(.44629)-2 26.56256 


= = 54365 
4[2(7) — 4(.44629)] 48.85936 


c= 


. Below we present selected printout which gives the F tests for the contrasts 


and the associated p values: 
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VARIABLE HYPOTH. MS ERROR MS F SiG OF F 
HELMERT1 24.29999 2.76889 8.77608 .016 
HELMERT2 16.53750 1.08639 15.22245 .004 
HELMERT3 .00050 .18272 .00274 ,959 


If overall о is set at .10, then each contrast is being tested at the .10/3 = 
.0333 level of significance. Thus, the first two Helmert contrasts are signif- 
icant. 

The first Helmert contrast is testing whether the control group differs from 
the remaining 3 groups (the three treatment or drug groups here), while the 
second Helmert contrast is testing whether the effect of Drug Type I differs 
from that of the two remaining drugs (which were similar in composition). 


CHAPTER 6 


. (a) There definitely does appear to be a linear relationship. 
(c) There is not a pattern in the residuals, which indicates a linear model is 
appropriate (see Fig. A.3). 


20 y = .978 x + 2.17 


FIGURE A.3 
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3. (a) If x1 enters the equation first, it will account for (.60)2 x 100, or 36% of 


the variance on y. 

(b) To determine how much variance on y predictor x), will account for if 
entered second we need to partial out x2. Hence we compute the following 
semi partial correlation: 


"PE Typ — ђ2ћ2 
yl.2(s) — J- 
1—73 
_ .60—.50(.80) _ 


1-08) 
r2 2s) = СЗЗ) = 1089 


33 


Thus, xı accounts for about 11 % of the variance if entered second. 

(c) Since x; and хә are strongly correlated (multicollinearity), when a pre- 
dictor enters the equation influences greatly how much variance it will ac- 
count for. Here when xı entered first it accounted for 36% of variance, 
while it only accounted for 11% when entered second. 


. (a) For STEPWISE regression the model selected has SIZE, NEW and 


NO-BATH as predictors. 
(b) For BACKWARD elimination the same model is selected. 


TITLE ‘USING FIXED FORMAT AND TESTING SET OF 
PREDICTORS”. 


DATA LIST FIXED/X1 1 X2 2 X3 3-4 X4 5-6 X5 7-8 
X6 9-11(2) X7 12-14 X8 15-16. 

BEGIN DATA. 

DATA LINES 

END DATA. 

LIST. 

REGRESSION VARIABLES = X1 TO X8/ 

DEPENDENT - X8/ 

ENTER X1 X2/TEST(X3 X4 X5)/. 


CHAPTER 7 


1. (a) ANCOVA is appropriate since there is a significant linear relationship 


(p = .000) and the homogeneity of regression slopes assumption is tenable 


(p -.521). 
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(b) We do reject the hypothesis of equal adjusted means since F = 8.61 
with p = .001. 
(c) The error terms are related by the equation: 


MS,,MS\, (1 =, r2) 


. The error term for the ANCOVA is considerably smaller: 111.72 versus 
139.98 for the ANOVA on the difference scores. The regression coefficient 
from the ANCOVA is .69876. In 5.10 it was stated that, “...whenever, the 
regression coefficient is not equal to 1, the error term for ANCOVA will be 
smaller than that for the gain score analysis and hence the ANCOVA will 
be a more sensitive or powerful analysis of variance assumption since the 
cell sizes were approximately equal, and ANOVA is known to be robust in 
this situation.” 

From 4.6, the relationship between the interaction effect size and the test 
statistic is given by 


f —-4r—-D(c-DF/N 


Therefore, f — Je —1)(2— 14.49/34 = .3634 


The effect size is fairly large. The n that we would use to enter Cohen’s 
power tables is 


п=[(М—тс)/(т—1)(с—1)-Е1]++1 
п = [(34 — 4) /(2— 1)(2=1) +1+1= 16 


Since the degrees of freedom for interaction here is 1, we use Table С.І апа 
find that power is around .50. Thus, although power was not good in this 
study, nevertheless significance was found. 


. The fact that the correlation is .61 and that the homogeneity of slopes as- 
sumption is tenable means that ANCOVA is appropriate. The grand mean 
for the study (assuming equal n per group) is 110. Therefore, when the 
means on the dependent variable are adjusted they will be drawn much 
closer together, causing a much smaller mean sum of squares between and 
the loss of significance. The mean of 70 for Group 1 will be adjusted down- 
ward (perhaps to a value of 67) while the mean for Group 2 will be adjusted 
upward (perhaps to a value of 63). Thus, the adjusted means for the 3 
groups would be 67, 63, and 65. 
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